Tom White - Hadoop The Definitive Guide_ 4 edition - 2015 (811394), страница 68
Текст из файла (страница 68)
In the default configuration it is disabled, but it’s easy to enable by adding thefollowing line to hadoop-env.sh:export HDFS_AUDIT_LOGGER="INFO,RFAAUDIT"324| Chapter 11: Administering HadoopA log line is written to the audit log (hdfs-audit.log) for every HDFS event. Here’s anexample for a list status request on /user/tom:2014-09-30 21:35:30,484 INFO FSNamesystem.audit: allowed=trueugi=tom(auth:SIMPLE)ip=/127.0.0.1cmd=listStatus src=/user/tomdst=nullperm=nullproto=rpcToolsdfsadminThe dfsadmin tool is a multipurpose tool for finding information about the state ofHDFS, as well as for performing administration operations on HDFS. It is invoked ashdfs dfsadmin and requires superuser privileges.Some of the available commands to dfsadmin are described in Table 11-2.
Use the -helpcommand to get more information.Table 11-2. dfsadmin commandsCommandDescription-helpShows help for a given command, or all commands if no command is specified.-reportShows filesystem statistics (similar to those shown in the web UI) and information on connecteddatanodes.-metasaveDumps information to a file in Hadoop’s log directory about blocks that are being replicated ordeleted, as well as a list of connected datanodes.-safemodeChanges or queries the state of safe mode. See “Safe Mode” on page 322.-saveNamespaceSaves the current in-memory filesystem image to a new fsimage file and resets the edits file.This operation may be performed only in safe mode.-fetchImageRetrieves the latest fsimage from the namenode and saves it in a local file.-refreshNodesUpdates the set of datanodes that are permitted to connect to the namenode.
See“Commissioning and Decommissioning Nodes” on page 334.-upgradeProgressGets information on the progress of an HDFS upgrade or forces an upgrade to proceed. See“Upgrades” on page 337.-finalizeUpgradeRemoves the previous version of the namenode and datanode storage directories. Used after anupgrade has been applied and the cluster is running successfully on the new version. See“Upgrades” on page 337.-setQuotaSets directory quotas. Directory quotas set a limit on the number of names (files or directories)in the directory tree. Directory quotas are useful for preventing users from creating largenumbers of small files, a measure that helps preserve the namenode’s memory (recall thataccounting information for every file, directory, and block in the filesystem is stored in memory).-clrQuotaClears specified directory quotas.-setSpaceQuotaSets space quotas on directories. Space quotas set a limit on the size of files that may be storedin a directory tree.
They are useful for giving users a limited amount of storage.-clrSpaceQuotaClears specified space quotas.HDFS|325CommandDescription-refreshServiceAcl Refreshes the namenode’s service-level authorization policy file.-allowSnapshotAllows snapshot creation for the specified directory.-disallowSnapshotDisallows snapshot creation for the specified directory.Filesystem check (fsck)Hadoop provides an fsck utility for checking the health of files in HDFS.
The tool looksfor blocks that are missing from all datanodes, as well as under- or over-replicatedblocks. Here is an example of checking the whole filesystem for a small cluster:% hdfs fsck /......................Status: HEALTHYTotal size: 511799225 BTotal dirs: 10Total files: 22Total blocks (validated): 22 (avg. block size 23263601 B)Minimally replicated blocks: 22 (100.0 %)Over-replicated blocks: 0 (0.0 %)Under-replicated blocks: 0 (0.0 %)Mis-replicated blocks: 0 (0.0 %)Default replication factor: 3Average block replication: 3.0Corrupt blocks: 0Missing replicas: 0 (0.0 %)Number of data-nodes: 4Number of racks: 1The filesystem under path '/' is HEALTHYfsck recursively walks the filesystem namespace, starting at the given path (here thefilesystem root), and checks the files it finds.
It prints a dot for every file it checks. Tocheck a file, fsck retrieves the metadata for the file’s blocks and looks for problems orinconsistencies. Note that fsck retrieves all of its information from the namenode; itdoes not communicate with any datanodes to actually retrieve any block data.Most of the output from fsck is self-explanatory, but here are some of the conditions itlooks for:Over-replicated blocksThese are blocks that exceed their target replication for the file they belong to.Normally, over-replication is not a problem, and HDFS will automatically deleteexcess replicas.Under-replicated blocksThese are blocks that do not meet their target replication for the file they belong to.HDFS will automatically create new replicas of under-replicated blocks until they326|Chapter 11: Administering Hadoopmeet the target replication.
You can get information about the blocks being repli‐cated (or waiting to be replicated) using hdfs dfsadmin -metasave.Misreplicated blocksThese are blocks that do not satisfy the block replica placement policy (see “ReplicaPlacement” on page 73). For example, for a replication level of three in a multirackcluster, if all three replicas of a block are on the same rack, then the block is mis‐replicated because the replicas should be spread across at least two racks forresilience.
HDFS will automatically re-replicate misreplicated blocks so that theysatisfy the rack placement policy.Corrupt blocksThese are blocks whose replicas are all corrupt. Blocks with at least one noncorruptreplica are not reported as corrupt; the namenode will replicate the noncorruptreplica until the target replication is met.Missing replicasThese are blocks with no replicas anywhere in the cluster.Corrupt or missing blocks are the biggest cause for concern, as they mean data has beenlost. By default, fsck leaves files with corrupt or missing blocks, but you can tell it toperform one of the following actions on them:• Move the affected files to the /lost+found directory in HDFS, using the -move option.Files are broken into chains of contiguous blocks to aid any salvaging efforts youmay attempt.• Delete the affected files, using the -delete option. Files cannot be recovered afterbeing deleted.Finding the blocks for a file.
The fsck tool provides an easy way to find out which blocksare in any particular file. For example:% hdfs fsck /user/tom/part-00007 -files -blocks -racks/user/tom/part-00007 25582428 bytes, 1 block(s): OK0. blk_-3724870485760122836_1035 len=25582428 repl=3 [/default-rack/10.251.43.2:50010,/default-rack/10.251.27.178:50010, /default-rack/10.251.123.163:50010]This says that the file /user/tom/part-00007 is made up of one block and shows thedatanodes where the block is located.
The fsck options used are as follows:• The -files option shows the line with the filename, size, number of blocks, andits health (whether there are any missing blocks).• The -blocks option shows information about each block in the file, one line perblock.HDFS|327• The -racks option displays the rack location and the datanode addresses for eachblock.Running hdfs fsck without any arguments displays full usage instructions.Datanode block scannerEvery datanode runs a block scanner, which periodically verifies all the blocks stored onthe datanode.
This allows bad blocks to be detected and fixed before they are read byclients. The scanner maintains a list of blocks to verify and scans them one by one forchecksum errors. It employs a throttling mechanism to preserve disk bandwidth on thedatanode.Blocks are verified every three weeks to guard against disk errors over time (this periodis controlled by the dfs.datanode.scan.period.hours property, which defaults to 504hours). Corrupt blocks are reported to the namenode to be fixed.You can get a block verification report for a datanode by visiting the datanode’s webinterface at http://datanode:50075/blockScannerReport. Here’s an example of a report,which should be self-explanatory:Total BlocksVerified in last hourVerified in last dayVerified in last weekVerified in last four weeksVerified in SCAN_PERIODNot yet verifiedVerified since restartScans since restartScan errors since restartTransient scan errorsCurrent scan rate limit KBpsProgress this periodTime left in cur period: 21131:70:1767:7360: 20057: 20057:1074: 35912:6541:0:0:1024:109%: 53.08%If you specify the listblocks parameter, http://datanode:50075/blockScannerReport?listblocks, the report is preceded by a list of all the blocks on the datanode along withtheir latest verification status.
Here is a snippet of the block list (lines are split to fit thepage):blk_6035596358209321442: status : oktype : nonescan time :0not yet verifiedblk_3065580480714947643: status : oktype : remote scan time :12157553064002008-07-11 05:48:26,400blk_8729669677359108508: status : oktype : local scan time :12157557273452008-07-11 05:55:27,345The first column is the block ID, followed by some key-value pairs. The status can beone of failed or ok, according to whether the last scan of the block detected a checksum328| Chapter 11: Administering Hadooperror. The type of scan is local if it was performed by the background thread, remoteif it was performed by a client or a remote datanode, or none if a scan of this block hasyet to be made.
The last piece of information is the scan time, which is displayed as thenumber of milliseconds since midnight on January 1, 1970, and also as a more readablevalue.BalancerOver time, the distribution of blocks across datanodes can become unbalanced. Anunbalanced cluster can affect locality for MapReduce, and it puts a greater strain on thehighly utilized datanodes, so it’s best avoided.The balancer program is a Hadoop daemon that redistributes blocks by moving themfrom overutilized datanodes to underutilized datanodes, while adhering to the blockreplica placement policy that makes data loss unlikely by placing block replicas on dif‐ferent racks (see “Replica Placement” on page 73). It moves blocks until the cluster isdeemed to be balanced, which means that the utilization of every datanode (ratio ofused space on the node to total capacity of the node) differs from the utilization of thecluster (ratio of used space on the cluster to total capacity of the cluster) by no morethan a given threshold percentage.