Tom White - Hadoop The Definitive Guide_ 4 edition - 2015 (811394), страница 59
Текст из файла (страница 59)
These tools encode a lot of operator knowledge about runningHadoop. For example, they use heuristics based on the hardware profile (among283other factors) to choose good defaults for Hadoop configuration settings. For morecomplex setups, like HA, or secure Hadoop, the management tools provide welltested wizards for getting a working cluster in a short amount of time. Finally, theyadd extra features that the other installation options don’t offer, such as unifiedmonitoring and log search, and rolling upgrades (so you can upgrade the clusterwithout experiencing downtime).This chapter and the next give you enough information to set up and operate your ownbasic cluster, but even if you are using Hadoop cluster management tools or a servicein which a lot of the routine setup and maintenance are done for you, these chaptersstill offer valuable information about how Hadoop works from an operations point ofview.
For more in-depth information, I highly recommend Hadoop Operations by EricSammer (O’Reilly, 2012).Cluster SpecificationHadoop is designed to run on commodity hardware. That means that you are not tiedto expensive, proprietary offerings from a single vendor; rather, you can choose stand‐ardized, commonly available hardware from any of a large range of vendors to buildyour cluster.“Commodity” does not mean “low-end.” Low-end machines often have cheap compo‐nents, which have higher failure rates than more expensive (but still commodity-class)machines. When you are operating tens, hundreds, or thousands of machines, cheapcomponents turn out to be a false economy, as the higher failure rate incurs a greatermaintenance cost. On the other hand, large database-class machines are not recom‐mended either, since they don’t score well on the price/performance curve.
And eventhough you would need fewer of them to build a cluster of comparable performance toone built of mid-range commodity hardware, when one did fail, it would have a biggerimpact on the cluster because a larger proportion of the cluster hardware would beunavailable.Hardware specifications rapidly become obsolete, but for the sake of illustration, a typ‐ical choice of machine for running an HDFS datanode and a YARN node manager in2014 would have had the following specifications:ProcessorTwo hex/octo-core 3 GHz CPUsMemory64−512 GB ECC RAM11.
ECC memory is strongly recommended, as several Hadoop users have reported seeing many checksum errorswhen using non-ECC memory on Hadoop clusters.284|Chapter 10: Setting Up a Hadoop ClusterStorage12−24 × 1−4 TB SATA disksNetworkGigabit Ethernet with link aggregationAlthough the hardware specification for your cluster will assuredly be different, Hadoopis designed to use multiple cores and disks, so it will be able to take full advantage ofmore powerful hardware.Why Not Use RAID?HDFS clusters do not benefit from using RAID (redundant array of independent disks)for datanode storage (although RAID is recommended for the namenode’s disks, toprotect against corruption of its metadata).
The redundancy that RAID provides is notneeded, since HDFS handles it by replication between nodes.Furthermore, RAID striping (RAID 0), which is commonly used to increase perfor‐mance, turns out to be slower than the JBOD (just a bunch of disks) configuration usedby HDFS, which round-robins HDFS blocks between all disks.
This is because RAID 0read and write operations are limited by the speed of the slowest-responding disk in theRAID array. In JBOD, disk operations are independent, so the average speed of opera‐tions is greater than that of the slowest disk. Disk performance often shows considerablevariation in practice, even for disks of the same model. In some benchmarking carriedout on a Yahoo! cluster, JBOD performed 10% faster than RAID 0 in one test (Gridmix)and 30% better in another (HDFS write throughput).Finally, if a disk fails in a JBOD configuration, HDFS can continue to operate withoutthe failed disk, whereas with RAID, failure of a single disk causes the whole array (andhence the node) to become unavailable.Cluster SizingHow large should your cluster be? There isn’t an exact answer to this question, but thebeauty of Hadoop is that you can start with a small cluster (say, 10 nodes) and grow itas your storage and computational needs grow.
In many ways, a better question is this:how fast does your cluster need to grow? You can get a good feel for this by consideringstorage capacity.For example, if your data grows by 1 TB a day and you have three-way HDFS replication,you need an additional 3 TB of raw storage per day.
Allow some room for intermediatefiles and logfiles (around 30%, say), and this is in the range of one (2014-vintage) ma‐chine per week. In practice, you wouldn’t buy a new machine each week and add it tothe cluster. The value of doing a back-of-the-envelope calculation like this is that it givesCluster Specification|285you a feel for how big your cluster should be. In this example, a cluster that holds twoyears’ worth of data needs 100 machines.Master node scenariosDepending on the size of the cluster, there are various configurations for running themaster daemons: the namenode, secondary namenode, resource manager, and historyserver.
For a small cluster (on the order of 10 nodes), it is usually acceptable to run thenamenode and the resource manager on a single master machine (as long as at least onecopy of the namenode’s metadata is stored on a remote filesystem). However, as thecluster gets larger, there are good reasons to separate them.The namenode has high memory requirements, as it holds file and block metadata forthe entire namespace in memory. The secondary namenode, although idle most of thetime, has a comparable memory footprint to the primary when it creates a checkpoint.(This is explained in detail in “The filesystem image and edit log” on page 318.) Forfilesystems with a large number of files, there may not be enough physical memory onone machine to run both the primary and secondary namenode.Aside from simple resource requirements, the main reason to run masters on separatemachines is for high availability.
Both HDFS and YARN support configurations wherethey can run masters in active-standby pairs. If the active master fails, then the standby,running on separate hardware, takes over with little or no interruption to the service.In the case of HDFS, the standby performs the checkpointing function of the secondarynamenode (so you don’t need to run a standby and a secondary namenode).Configuring and running Hadoop HA is not covered in this book. Refer to the Hadoopwebsite or vendor documentation for details.Network TopologyA common Hadoop cluster architecture consists of a two-level network topology, asillustrated in Figure 10-1. Typically there are 30 to 40 servers per rack (only 3 are shownin the diagram), with a 10 Gb switch for the rack and an uplink to a core switch or router(at least 10 Gb or better). The salient point is that the aggregate bandwidth betweennodes on the same rack is much greater than that between nodes on different racks.286| Chapter 10: Setting Up a Hadoop ClusterFigure 10-1.
Typical two-level network architecture for a Hadoop clusterRack awarenessTo get maximum performance out of Hadoop, it is important to configure Hadoop sothat it knows the topology of your network. If your cluster runs on a single rack, thenthere is nothing more to do, since this is the default. However, for multirack clusters,you need to map nodes to racks. This allows Hadoop to prefer within-rack transfers(where there is more bandwidth available) to off-rack transfers when placingMapReduce tasks on nodes.
HDFS will also be able to place replicas more intelligentlyto trade off performance and resilience.Network locations such as nodes and racks are represented in a tree, which reflects thenetwork “distance” between locations. The namenode uses the network location whendetermining where to place block replicas (see “Network Topology and Hadoop” onpage 70); the MapReduce scheduler uses network location to determine where the clos‐est replica is for input to a map task.For the network in Figure 10-1, the rack topology is described by two network locations—say, /switch1/rack1 and /switch1/rack2.
Because there is only one top-level switch inthis cluster, the locations can be simplified to /rack1 and /rack2.The Hadoop configuration must specify a map between node addresses and networklocations. The map is described by a Java interface, DNSToSwitchMapping, whosesignature is:public interface DNSToSwitchMapping {public List<String> resolve(List<String> names);}Cluster Specification|287The names parameter is a list of IP addresses, and the return value is a list of corre‐sponding network location strings. The net.topology.node.switch.mapping.implconfiguration property defines an implementation of the DNSToSwitchMapping inter‐face that the namenode and the resource manager use to resolve worker node networklocations.For the network in our example, we would map node1, node2, and node3 to /rack1, andnode4, node5, and node6 to /rack2.Most installations don’t need to implement the interface themselves, however, since thedefault implementation is ScriptBasedMapping, which runs a user-defined script todetermine the mapping.