Tom White - Hadoop The Definitive Guide_ 4 edition - 2015 (811394), страница 67

Файл №811394 Tom White - Hadoop The Definitive Guide_ 4 edition - 2015 (Tom White - Hadoop The Definitive Guide_ 4 edition - 2015.pdf) 67 страницаTom White - Hadoop The Definitive Guide_ 4 edition - 2015 (811394) страница 672020-08-252020-08-25СтудИзба

Tom White - Hadoop The Definitive Guide_ 4 edition - 2015.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 67)

The secondary retrieves the latest fsimage and edits files from the primary (usingHTTP GET).3. The secondary loads fsimage into memory, applies each transaction from edits, thencreates a new merged fsimage file.4. The secondary sends the new fsimage back to the primary (using HTTP PUT), andthe primary saves it as a temporary .ckpt file.5. The primary renames the temporary fsimage file to make it available.At the end of the process, the primary has an up-to-date fsimage file and a short inprogress edits file (it is not necessarily empty, as it may have received some edits whilethe checkpoint was being taken).

It is possible for an administrator to run this processmanually while the namenode is in safe mode, using the hdfs dfsadmin-saveNamespace command.This procedure makes it clear why the secondary has similar memory requirements tothe primary (since it loads the fsimage into memory), which is the reason that the sec‐ondary needs a dedicated machine on large clusters.The schedule for checkpointing is controlled by two configuration parameters. Thesecondary namenode checkpoints every hour (dfs.namenode.checkpoint.period inseconds), or sooner if the edit log has reached one million transactions since the lastcheckpoint (dfs.namenode.checkpoint.txns), which it checks every minute(dfs.namenode.checkpoint.check.period in seconds).1.

It is actually possible to start a namenode with the -checkpoint option so that it runs the checkpointingprocess against another (primary) namenode. This is functionally equivalent to running a secondary name‐node, but at the time of this writing offers no advantages over the secondary namenode (and indeed, thesecondary namenode is the most tried and tested option). When running in a high-availability environment(see “HDFS High Availability” on page 48), the standby node performs checkpointing.320|Chapter 11: Administering HadoopFigure 11-1. The checkpointing processSecondary namenode directory structureThe layout of the secondary’s checkpoint directory (dfs.namenode.checkpoint.dir)is identical to the namenode’s. This is by design, since in the event of total namenodefailure (when there are no recoverable backups, even from NFS), it allows recovery froma secondary namenode.

This can be achieved either by copying the relevant storagedirectory to a new namenode or, if the secondary is taking over as the new primarynamenode, by using the -importCheckpoint option when starting the namenode dae‐mon. The -importCheckpoint option will load the namenode metadata from the latestcheckpoint in the directory defined by the dfs.namenode.checkpoint.dir property,but only if there is no metadata in the dfs.namenode.name.dir directory, to ensure thatthere is no risk of overwriting precious metadata.HDFS|321Datanode directory structureUnlike namenodes, datanodes do not need to be explicitly formatted, because they createtheir storage directories automatically on startup. Here are the key files and directories:${dfs.datanode.data.dir}/├── current│├── BP-526805057-127.0.0.1-1411980876842││└── current││├── VERSION││├── finalized│││├── blk_1073741825│││├── blk_1073741825_1001.meta│││├── blk_1073741826│││└── blk_1073741826_1002.meta││└── rbw│└── VERSION└── in_use.lockHDFS blocks are stored in files with a blk_ prefix; they consist of the raw bytes of aportion of the file being stored.

Each block has an associated metadata file witha .meta suffix. It is made up of a header with version and type information, followed bya series of checksums for sections of the block.Each block belongs to a block pool, and each block pool has its own storage directorythat is formed from its ID (it’s the same block pool ID from the namenode’s VERSIONfile).When the number of blocks in a directory grows to a certain size, the datanode createsa new subdirectory in which to place new blocks and their accompanying metadata. Itcreates a new subdirectory every time the number of blocks in a directory reaches 64(set by the dfs.datanode.numblocks configuration property). The effect is to have atree with high fan-out, so even for systems with a very large number of blocks, thedirectories will be only a few levels deep.

By taking this measure, the datanode ensuresthat there is a manageable number of files per directory, which avoids the problems thatmost operating systems encounter when there are a large number of files (tens or hun‐dreds of thousands) in a single directory.If the configuration property dfs.datanode.data.dir specifies multiple directories ondifferent drives, blocks are written in a round-robin fashion. Note that blocks are notreplicated on each drive on a single datanode; instead, block replication is across distinctdatanodes.Safe ModeWhen the namenode starts, the first thing it does is load its image file (fsimage) intomemory and apply the edits from the edit log.

Once it has reconstructed a consistentin-memory image of the filesystem metadata, it creates a new fsimage file (effectively322|Chapter 11: Administering Hadoopdoing the checkpoint itself, without recourse to the secondary namenode) and an emptyedit log. During this process, the namenode is running in safe mode, which means thatit offers only a read-only view of the filesystem to clients.Strictly speaking, in safe mode, only filesystem operations that ac‐cess the filesystem metadata (such as producing a directory listing)are guaranteed to work. Reading a file will work only when the blocksare available on the current set of datanodes in the cluster, and filemodifications (writes, deletes, or renames) will always fail.Recall that the locations of blocks in the system are not persisted by the namenode; thisinformation resides with the datanodes, in the form of a list of the blocks each one isstoring.

During normal operation of the system, the namenode has a map of blocklocations stored in memory. Safe mode is needed to give the datanodes time to checkin to the namenode with their block lists, so the namenode can be informed of enoughblock locations to run the filesystem effectively. If the namenode didn’t wait for enoughdatanodes to check in, it would start the process of replicating blocks to new datanodes,which would be unnecessary in most cases (because it only needed to wait for the extradatanodes to check in) and would put a great strain on the cluster’s resources. Indeed,while in safe mode, the namenode does not issue any block-replication or deletioninstructions to datanodes.Safe mode is exited when the minimal replication condition is reached, plus an extensiontime of 30 seconds.

The minimal replication condition is when 99.9% of the blocks inthe whole filesystem meet their minimum replication level (which defaults to 1 and isset by dfs.namenode.replication.min; see Table 11-1).When you are starting a newly formatted HDFS cluster, the namenode does not go intosafe mode, since there are no blocks in the system.Table 11-1. Safe mode propertiesProperty nameTypeDefault value Descriptiondfs.namenode.replication.minint1dfs.namenode.safemode.threshold-pct float 0.999The minimum number of replicas that have tobe written for a write to be successful.The proportion of blocks in the system that mustmeet the minimum replication level defined bydfs.namenode.replication.minbefore the namenode will exit safe mode.Setting this value to 0 or less forces thenamenode not to start in safe mode.

Setting thisvalue to more than 1 means the namenodenever exits safe mode.HDFS|323Property nameTypeDefault value Descriptiondfs.namenode.safemode.extensionint30000The time, in milliseconds, to extend safe modeafter the minimum replication condition definedbydfs.namenode.safemode.thresholdpct has been satisfied. For small clusters (tensof nodes), it can be set to 0.Entering and leaving safe modeTo see whether the namenode is in safe mode, you can use the dfsadmin command:% hdfs dfsadmin -safemode getSafe mode is ONThe front page of the HDFS web UI provides another indication of whether the name‐node is in safe mode.Sometimes you want to wait for the namenode to exit safe mode before carrying out acommand, particularly in scripts. The wait option achieves this:% hdfs dfsadmin -safemode wait# command to read or write a fileAn administrator has the ability to make the namenode enter or leave safe mode at anytime.

It is sometimes necessary to do this when carrying out maintenance on the clusteror after upgrading a cluster, to confirm that data is still readable. To enter safe mode,use the following command:% hdfs dfsadmin -safemode enterSafe mode is ONYou can use this command when the namenode is still in safe mode while starting upto ensure that it never leaves safe mode. Another way of making sure that the namenodestays in safe mode indefinitely is to set the property dfs.namenode.safemode.threshold-pct to a value over 1.You can make the namenode leave safe mode by using the following:% hdfs dfsadmin -safemode leaveSafe mode is OFFAudit LoggingHDFS can log all filesystem access requests, a feature that some organizations requirefor auditing purposes. Audit logging is implemented using log4j logging at the INFOlevel.

Характеристики

Тип файла

PDF-файл

Размер

9,6 Mb

Материал

Tom White - Hadoop The Definitive Guide_ 4 edition - 2015.pdf

Тип материала

Книга

Предмет

(СМРХиОД) Современные методы распределенного хранения и обработки данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

tom-white-hadoop-the-definitive-guide_-4-edition-2015.pdf.rar

Tom White - Hadoop The Definitive Guide_ 4 edition - 2015.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.