Tom White - Hadoop The Definitive Guide_ 4 edition - 2015 (811394), страница 21
Текст из файла (страница 21)
In YARN,the equivalent role is the timeline server, which stores application history.5The YARN equivalent of a tasktracker is a node manager. The mapping is summarizedin Table 4-1.Table 4-1. A comparison of MapReduce 1 and YARN componentsMapReduce 1YARNJobtrackerResource manager, application master, timelineserverTasktrackerNode managerSlotContainerYARN was designed to address many of the limitations in MapReduce 1. The benefitsto using YARN include the following:ScalabilityYARN can run on larger clusters than MapReduce 1. MapReduce 1 hits scalabilitybottlenecks in the region of 4,000 nodes and 40,000 tasks,6 stemming from the factthat the jobtracker has to manage both jobs and tasks.
YARN overcomes theselimitations by virtue of its split resource manager/application master architecture:it is designed to scale up to 10,000 nodes and 100,000 tasks.In contrast to the jobtracker, each instance of an application—here, a MapReducejob—has a dedicated application master, which runs for the duration of the appli‐cation. This model is actually closer to the original Google MapReduce paper, whichdescribes how a master process is started to coordinate map and reduce tasks run‐ning on a set of workers.AvailabilityHigh availability (HA) is usually achieved by replicating the state needed for anotherdaemon to take over the work needed to provide the service, in the event of theservice daemon failing.
However, the large amount of rapidly changing complexstate in the jobtracker’s memory (each task status is updated every few seconds, forexample) makes it very difficult to retrofit HA into the jobtracker service.With the jobtracker’s responsibilities split between the resource manager and ap‐plication master in YARN, making the service highly available became a divideand-conquer problem: provide HA for the resource manager, then for YARN ap‐plications (on a per-application basis). And indeed, Hadoop 2 supports HA both5.
As of Hadoop 2.5.1, the YARN timeline server does not yet store MapReduce job history, so a MapReducejob history server daemon is still needed (see “Cluster Setup and Installation” on page 288).6. Arun C. Murthy, “The Next Generation of Apache Hadoop MapReduce,” February 14, 2011.84| Chapter 4: YARNfor the resource manager and for the application master for MapReduce jobs. Fail‐ure recovery in YARN is discussed in more detail in “Failures” on page 193.UtilizationIn MapReduce 1, each tasktracker is configured with a static allocation of fixed-size“slots,” which are divided into map slots and reduce slots at configuration time. Amap slot can only be used to run a map task, and a reduce slot can only be used fora reduce task.In YARN, a node manager manages a pool of resources, rather than a fixed numberof designated slots. MapReduce running on YARN will not hit the situation wherea reduce task has to wait because only map slots are available on the cluster, whichcan happen in MapReduce 1.
If the resources to run the task are available, then theapplication will be eligible for them.Furthermore, resources in YARN are fine grained, so an application can make arequest for what it needs, rather than for an indivisible slot, which may be too big(which is wasteful of resources) or too small (which may cause a failure) for theparticular task.MultitenancyIn some ways, the biggest benefit of YARN is that it opens up Hadoop to other typesof distributed application beyond MapReduce. MapReduce is just one YARN ap‐plication among many.It is even possible for users to run different versions of MapReduce on the sameYARN cluster, which makes the process of upgrading MapReduce more manage‐able. (Note, however, that some parts of MapReduce, such as the job history serverand the shuffle handler, as well as YARN itself, still need to be upgraded across thecluster.)Since Hadoop 2 is widely used and is the latest stable version, in the rest of this bookthe term “MapReduce” refers to MapReduce 2 unless otherwise stated.
Chapter 7 looksin detail at how MapReduce running on YARN works.Scheduling in YARNIn an ideal world, the requests that a YARN application makes would be granted im‐mediately. In the real world, however, resources are limited, and on a busy cluster, anapplication will often need to wait to have some of its requests fulfilled. It is the job ofthe YARN scheduler to allocate resources to applications according to some definedpolicy.
Scheduling in general is a difficult problem and there is no one “best” policy,which is why YARN provides a choice of schedulers and configurable policies. We lookat these next.Scheduling in YARN|85Scheduler OptionsThree schedulers are available in YARN: the FIFO, Capacity, and Fair Schedulers. TheFIFO Scheduler places applications in a queue and runs them in the order of submission(first in, first out). Requests for the first application in the queue are allocated first; onceits requests have been satisfied, the next application in the queue is served, and so on.The FIFO Scheduler has the merit of being simple to understand and not needing anyconfiguration, but it’s not suitable for shared clusters.
Large applications will use all theresources in a cluster, so each application has to wait its turn. On a shared cluster it isbetter to use the Capacity Scheduler or the Fair Scheduler. Both of these allow longrunning jobs to complete in a timely manner, while still allowing users who are runningconcurrent smaller ad hoc queries to get results back in a reasonable time.The difference between schedulers is illustrated in Figure 4-3, which shows that underthe FIFO Scheduler (i) the small job is blocked until the large job completes.With the Capacity Scheduler (ii in Figure 4-3), a separate dedicated queue allows thesmall job to start as soon as it is submitted, although this is at the cost of overall clusterutilization since the queue capacity is reserved for jobs in that queue. This means thatthe large job finishes later than when using the FIFO Scheduler.With the Fair Scheduler (iii in Figure 4-3), there is no need to reserve a set amount ofcapacity, since it will dynamically balance resources between all running jobs.
Just afterthe first (large) job starts, it is the only job running, so it gets all the resources in thecluster. When the second (small) job starts, it is allocated half of the cluster resourcesso that each job is using its fair share of resources.Note that there is a lag between the time the second job starts and when it receives itsfair share, since it has to wait for resources to free up as containers used by the first jobcomplete. After the small job completes and no longer requires resources, the large jobgoes back to using the full cluster capacity again. The overall effect is both high clusterutilization and timely small job completion.Figure 4-3 contrasts the basic operation of the three schedulers.
In the next two sections,we examine some of the more advanced configuration options for the Capacity and FairSchedulers.86| Chapter 4: YARNFigure 4-3. Cluster utilization over time when running a large job and a small job un‐der the FIFO Scheduler (i), Capacity Scheduler (ii), and Fair Scheduler (iii)Scheduling in YARN|87Capacity Scheduler ConfigurationThe Capacity Scheduler allows sharing of a Hadoop cluster along organizational lines,whereby each organization is allocated a certain capacity of the overall cluster.
Eachorganization is set up with a dedicated queue that is configured to use a given fractionof the cluster capacity. Queues may be further divided in hierarchical fashion, allowingeach organization to share its cluster allowance between different groups of users withinthe organization. Within a queue, applications are scheduled using FIFO scheduling.As we saw in Figure 4-3, a single job does not use more resources than its queue’scapacity.
However, if there is more than one job in the queue and there are idle resourcesavailable, then the Capacity Scheduler may allocate the spare resources to jobs in thequeue, even if that causes the queue’s capacity to be exceeded.7 This behavior is knownas queue elasticity.In normal operation, the Capacity Scheduler does not preempt containers by forciblykilling them,8 so if a queue is under capacity due to lack of demand, and then demandincreases, the queue will only return to capacity as resources are released from otherqueues as containers complete. It is possible to mitigate this by configuring queues witha maximum capacity so that they don’t eat into other queues’ capacities too much.
Thisis at the cost of queue elasticity, of course, so a reasonable trade-off should be found bytrial and error.Imagine a queue hierarchy that looks like this:root├── prod└── dev├── eng└── scienceThe listing in Example 4-1 shows a sample Capacity Scheduler configuration file, calledcapacity-scheduler.xml, for this hierarchy. It defines two queues under the root queue,prod and dev, which have 40% and 60% of the capacity, respectively. Notice that a par‐ticular queue is configured by setting configuration properties of the formyarn.scheduler.capacity.<queue-path>.<sub-property>, where <queue-path> isthe hierarchical (dotted) path of the queue, such as root.prod.7.