Tom White - Hadoop The Definitive Guide_ 4 edition - 2015 (811394), страница 41
Текст из файла (страница 41)
During execution of the task, the Javaprocess passes input key-value pairs to the external process, which runs it through theuser-defined map or reduce function and passes the output key-value pairs back to theJava process. From the node manager’s point of view, it is as if the child process ran themap or reduce code itself.Anatomy of a MapReduce Job Run|189Figure 7-2. The relationship of the Streaming executable to the node manager and thetask containerProgress and Status UpdatesMapReduce jobs are long-running batch jobs, taking anything from tens of seconds tohours to run. Because this can be a significant length of time, it’s important for the userto get feedback on how the job is progressing. A job and each of its tasks have a status,which includes such things as the state of the job or task (e.g., running, successfullycompleted, failed), the progress of maps and reduces, the values of the job’s counters,and a status message or description (which may be set by user code).
These statuseschange over the course of the job, so how do they get communicated back to the client?When a task is running, it keeps track of its progress (i.e., the proportion of the taskcompleted). For map tasks, this is the proportion of the input that has been processed.For reduce tasks, it’s a little more complex, but the system can still estimate the pro‐portion of the reduce input processed.
It does this by dividing the total progress into190|Chapter 7: How MapReduce Worksthree parts, corresponding to the three phases of the shuffle (see “Shuffle and Sort” onpage 197). For example, if the task has run the reducer on half its input, the task’s progressis 5/6, since it has completed the copy and sort phases (1/3 each) and is halfway throughthe reduce phase (1/6).What Constitutes Progress in MapReduce?Progress is not always measurable, but nevertheless, it tells Hadoop that a task is doingsomething. For example, a task writing output records is making progress, even whenit cannot be expressed as a percentage of the total number that will be written (becausethe latter figure may not be known, even by the task producing the output).Progress reporting is important, as Hadoop will not fail a task that’s making progress.All of the following operations constitute progress:• Reading an input record (in a mapper or reducer)• Writing an output record (in a mapper or reducer)• Setting the status description (via Reporter’s or TaskAttemptContext’s setStatus() method)• Incrementing a counter (using Reporter’s incrCounter() method or Counter’sincrement() method)• Calling Reporter’s or TaskAttemptContext’s progress() methodTasks also have a set of counters that count various events as the task runs (we saw anexample in “A test run” on page 27), which are either built into the framework, such asthe number of map output records written, or defined by users.As the map or reduce task runs, the child process communicates with its parent appli‐cation master through the umbilical interface.
The task reports its progress and status(including counters) back to its application master, which has an aggregate view of thejob, every three seconds over the umbilical interface.The resource manager web UI displays all the running applications with links to theweb UIs of their respective application masters, each of which displays further detailson the MapReduce job, including its progress.During the course of the job, the client receives the latest status by polling the applicationmaster every second (the interval is set via mapreduce.client.progressmonitor.pollinterval). Clients can also use Job’s getStatus() method to obtain a JobStatusinstance, which contains all of the status information for the job.The process is illustrated in Figure 7-3.Anatomy of a MapReduce Job Run|191Figure 7-3. How status updates are propagated through the MapReduce systemJob CompletionWhen the application master receives a notification that the last task for a job is com‐plete, it changes the status for the job to “successful.” Then, when the Job polls for status,it learns that the job has completed successfully, so it prints a message to tell the userand then returns from the waitForCompletion() method.
Job statistics and countersare printed to the console at this point.The application master also sends an HTTP job notification if it is configured to do so.This can be configured by clients wishing to receive callbacks, via themapreduce.job.end-notification.url property.Finally, on job completion, the application master and the task containers clean up theirworking state (so intermediate output is deleted), and the OutputCommitter’s commitJob() method is called.
Job information is archived by the job history server to enablelater interrogation by users if desired.192|Chapter 7: How MapReduce WorksFailuresIn the real world, user code is buggy, processes crash, and machines fail. One of themajor benefits of using Hadoop is its ability to handle such failures and allow your jobto complete successfully. We need to consider the failure of any of the following entities:the task, the application master, the node manager, and the resource manager.Task FailureConsider first the case of the task failing. The most common occurrence of this failureis when user code in the map or reduce task throws a runtime exception.
If this happens,the task JVM reports the error back to its parent application master before it exits. Theerror ultimately makes it into the user logs. The application master marks the taskattempt as failed, and frees up the container so its resources are available for anothertask.For Streaming tasks, if the Streaming process exits with a nonzero exit code, it is markedas failed. This behavior is governed by the stream.non.zero.exit.is.failure prop‐erty (the default is true).Another failure mode is the sudden exit of the task JVM—perhaps there is a JVM bugthat causes the JVM to exit for a particular set of circumstances exposed by theMapReduce user code.
In this case, the node manager notices that the process has exitedand informs the application master so it can mark the attempt as failed.Hanging tasks are dealt with differently. The application master notices that it hasn’treceived a progress update for a while and proceeds to mark the task as failed. The taskJVM process will be killed automatically after this period.3 The timeout period afterwhich tasks are considered failed is normally 10 minutes and can be configured on aper-job basis (or a cluster basis) by setting the mapreduce.task.timeout property to avalue in milliseconds.Setting the timeout to a value of zero disables the timeout, so long-running tasks arenever marked as failed. In this case, a hanging task will never free up its container, andover time there may be cluster slowdown as a result.
This approach should therefore beavoided, and making sure that a task is reporting progress periodically should suffice(see “What Constitutes Progress in MapReduce?” on page 191).3. If a Streaming process hangs, the node manager will kill it (along with the JVM that launched it) only in thefollowing circumstances: either yarn.nodemanager.container-executor.class is set to org.apache.hadoop.yarn.server.nodemanager.LinuxContainerExecutor, or the default container executor is beingused and the setsid command is available on the system (so that the task JVM and any processes it launchesare in the same process group). In any other case, orphaned Streaming processes will accumulate on thesystem, which will impact utilization over time.Failures|193When the application master is notified of a task attempt that has failed, it will rescheduleexecution of the task.
The application master will try to avoid rescheduling the task ona node manager where it has previously failed. Furthermore, if a task fails four times, itwill not be retried again. This value is configurable. The maximum number of attemptsto run a task is controlled by the mapreduce.map.maxattempts property for map tasksand mapreduce.reduce.maxattempts for reduce tasks. By default, if any task fails fourtimes (or whatever the maximum number of attempts is configured to), the whole jobfails.For some applications, it is undesirable to abort the job if a few tasks fail, as it may bepossible to use the results of the job despite some failures.
In this case, the maximumpercentage of tasks that are allowed to fail without triggering job failure can be set for thejob. Map tasks and reduce tasks are controlled independently, usingthe mapreduce.map.failures.maxpercent and mapreduce.reduce.failures.maxpercent properties.A task attempt may also be killed, which is different from it failing.