Tom White - Hadoop The Definitive Guide_ 4 edition - 2015 (811394), страница 62

Файл №811394 Tom White - Hadoop The Definitive Guide_ 4 edition - 2015 (Tom White - Hadoop The Definitive Guide_ 4 edition - 2015.pdf) 62 страницаTom White - Hadoop The Definitive Guide_ 4 edition - 2015 (811394) страница 622020-08-252020-08-25СтудИзба

Tom White - Hadoop The Definitive Guide_ 4 edition - 2015.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 62)

A typical yarn-site.xml configuration file<?xml version="1.0"?><configuration><property><name>yarn.resourcemanager.hostname</name><value>resourcemanager</value></property><property><name>yarn.nodemanager.local-dirs</name><value>/disk1/nm-local-dir,/disk2/nm-local-dir</value></property><property><name>yarn.nodemanager.aux-services</name><value>mapreduce.shuffle</value></property><property><name>yarn.nodemanager.resource.memory-mb</name><value>16384</value></property><property><name>yarn.nodemanager.resource.cpu-vcores</name>Hadoop Configuration|297<value>16</value></property></configuration>HDFSTo run HDFS, you need to designate one machine as a namenode.

In this case, theproperty fs.defaultFS is an HDFS filesystem URI whose host is the namenode’s host‐name or IP address and whose port is the port that the namenode will listen on forRPCs. If no port is specified, the default of 8020 is used.The fs.defaultFS property also doubles as specifying the default filesystem. The de‐fault filesystem is used to resolve relative paths, which are handy to use because theysave typing (and avoid hardcoding knowledge of a particular namenode’s address).

Forexample, with the default filesystem defined in Example 10-1, the relative URI /a/b isresolved to hdfs://namenode/a/b.If you are running HDFS, the fact that fs.defaultFS is used to spec‐ify both the HDFS namenode and the default filesystem means HDFShas to be the default filesystem in the server configuration. Bear inmind, however, that it is possible to specify a different filesystem asthe default in the client configuration, for convenience.For example, if you use both HDFS and S3 filesystems, then you havea choice of specifying either as the default in the client configura‐tion, which allows you to refer to the default with a relative URI andthe other with an absolute URI.There are a few other configuration properties you should set for HDFS: those that setthe storage directories for the namenode and for datanodes.

The property dfs.namenode.name.dir specifies a list of directories where the namenode stores persistentfilesystem metadata (the edit log and the filesystem image). A copy of each metadatafile is stored in each directory for redundancy. It’s common to configure dfs.namenode.name.dir so that the namenode metadata is written to one or two local disks, aswell as a remote disk, such as an NFS-mounted directory. Such a setup guards againstfailure of a local disk and failure of the entire namenode, since in both cases the filescan be recovered and used to start a new namenode. (The secondary namenode takesonly periodic checkpoints of the namenode, so it does not provide an up-to-date backupof the namenode.)You should also set the dfs.datanode.data.dir property, which specifies a list of di‐rectories for a datanode to store its blocks in.

Unlike the namenode, which uses multipledirectories for redundancy, a datanode round-robins writes between its storage direc‐tories, so for performance you should specify a storage directory for each local disk.Read performance also benefits from having multiple disks for storage, because blocks298| Chapter 10: Setting Up a Hadoop Clusterwill be spread across them and concurrent reads for distinct blocks will be correspond‐ingly spread across disks.For maximum performance, you should mount storage disks with thenoatime option. This setting means that last accessed time informa‐tion is not written on file reads, which gives significant perfor‐mance gains.Finally, you should configure where the secondary namenode stores its checkpoints ofthe filesystem. The dfs.namenode.checkpoint.dir property specifies a list of directo‐ries where the checkpoints are kept.

Like the storage directories for the namenode,which keep redundant copies of the namenode metadata, the checkpointed filesystemimage is stored in each checkpoint directory for redundancy.Table 10-2 summarizes the important configuration properties for HDFS.Table 10-2. Important HDFS daemon propertiesProperty nameTypeDefault valueDescriptionfs.defaultFSURIfile:///The default filesystem.

TheURI defines the hostname andport that the namenode’s RPCserver runs on. The defaultport is 8020. This property isset in core-site.xml.dfs.namenode.name.dirComma-separateddirectory namesfile://${hadoop.tmp.dir}/dfs/nameThe list of directories wherethe namenode stores itspersistent metadata.

Thenamenode stores a copy of themetadata in each directory inthe list.dfs.datanode.data.dirComma-separateddirectory namesfile://${hadoop.tmp.dir}/dfs/dataA list of directories where thedatanode stores blocks. Eachblock is stored in only one ofthese directories.dfs.namenode.checkpoint.dir Comma-separateddirectory namesA list of directories where thefile://${hasecondary namenode storesdoop.tmp.dir}/dfs/namesecondary checkpoints. It stores a copy ofthe checkpoint in eachdirectory in the list.Hadoop Configuration|299Note that the storage directories for HDFS are under Hadoop’s tem‐porary directory by default (this is configured via the hadoop.tmp.dir property, whose default is /tmp/hadoop-${user.name}).

Therefore, it is critical that these properties are set sothat data is not lost by the system when it clears out temporarydirectories.YARNTo run YARN, you need to designate one machine as a resource manager. The simplestway to do this is to set the property yarn.resourcemanager.hostname to the hostnameor IP address of the machine running the resource manager.

Many of the resourcemanager’s server addresses are derived from this property. For example, yarn.resourcemanager.address takes the form of a host-port pair, and the host defaults toyarn.resourcemanager.hostname. In a MapReduce client configuration, this propertyis used to connect to the resource manager over RPC.During a MapReduce job, intermediate data and working files are written to temporarylocal files. Because this data includes the potentially very large output of map tasks, youneed to ensure that the yarn.nodemanager.local-dirs property, which controls thelocation of local temporary storage for YARN containers, is configured to use disk par‐titions that are large enough.

The property takes a comma-separated list of directorynames, and you should use all available local disks to spread disk I/O (the directoriesare used in round-robin fashion). Typically, you will use the same disks and partitions(but different directories) for YARN local storage as you use for datanode block storage,as governed by the dfs.datanode.data.dir property, which was discussed earlier.Unlike MapReduce 1, YARN doesn’t have tasktrackers to serve map outputs to reducetasks, so for this function it relies on shuffle handlers, which are long-running auxiliaryservices running in node managers. Because YARN is a general-purpose service, theMapReduce shuffle handlers need to be enabled explicitly in yarn-site.xml by settingthe yarn.nodemanager.aux-services property to mapreduce_shuffle.Table 10-3 summarizes the important configuration properties for YARN.

The resourcerelated settings are covered in more detail in the next sections.Table 10-3. Important YARN daemon propertiesProperty nameTypeDefault valueDescriptionyarn.resourcemanager.hostnameHostname0.0.0.0The hostname of the machinethe resource manager runs on.Abbreviated ${y.rm.hostname} below.yarn.resourcemanager.addressHostname andport${y.rm.hostname}:8032The hostname and port thatthe resource manager’s RPCserver runs on.300| Chapter 10: Setting Up a Hadoop ClusterProperty nameTypeDefault valueDescriptionyarn.nodemanager.local-dirsComma-separated ${haA list of directories where nodedirectory names doop.tmp.dir}/ managers allow containers tostore intermediate data.

Thenm-local-dirdata is cleared out when theapplication ends.yarn.nodemanager.aux-servicesComma-separatedservice namesA list of auxiliary services runby the node manager. A serviceis implemented by the classdefined by the propertyyarn.nodemanager.auxservices.servicename.class. By default, noauxiliary services are specified.yarn.nodemanager.resource.memory- intmb8192The amount of physicalmemory (in MB) that may beallocated to containers beingrun by the node manager.yarn.nodemanager.vmem-pmem-ratiofloat2.1The ratio of virtual to physicalmemory for containers.

Virtualmemory usage may exceed theallocation by this amount.yarn.nodemanager.resource.cpuvcoresint8The number of CPU cores thatmay be allocated to containersbeing run by the nodemanager.Memory settings in YARN and MapReduceYARN treats memory in a more fine-grained manner than the slot-based model usedin MapReduce 1.

Rather than specifying a fixed maximum number of map and reduceslots that may run on a node at once, YARN allows applications to request an arbitraryamount of memory (within limits) for a task. In the YARN model, node managersallocate memory from a pool, so the number of tasks that are running on a particularnode depends on the sum of their memory requirements, and not simply on a fixednumber of slots.The calculation for how much memory to dedicate to a node manager for runningcontainers depends on the amount of physical memory on the machine. Each Hadoopdaemon uses 1,000 MB, so for a datanode and a node manager, the total is 2,000 MB.Set aside enough for other processes that are running on the machine, and the remaindercan be dedicated to the node manager’s containers by setting the configuration propertyyarn.nodemanager.resource.memory-mb to the total allocation in MB.

Характеристики

Тип файла

PDF-файл

Размер

9,6 Mb

Материал

Tom White - Hadoop The Definitive Guide_ 4 edition - 2015.pdf

Тип материала

Книга

Предмет

(СМРХиОД) Современные методы распределенного хранения и обработки данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

tom-white-hadoop-the-definitive-guide_-4-edition-2015.pdf.rar

Tom White - Hadoop The Definitive Guide_ 4 edition - 2015.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.