Tom White - Hadoop The Definitive Guide_ 4 edition - 2015 (811394), страница 53
Текст из файла (страница 53)
Built-in filesystem task countersCounterDescriptionFilesystem bytes read (BYTES_READ)The number of bytes read by the filesystem by map and reduce tasks. There is acounter for each filesystem, and Filesystem may be Local, HDFS, S3, etc.Filesystem bytes written(BYTES_WRITTEN)The number of bytes written by the filesystem by map and reduce tasks.Filesystem read ops (READ_OPS)The number of read operations (e.g., open, file status) by the filesystem by map andreduce tasks.Filesystem large read ops(LARGE_READ_OPS)The number of large read operations (e.g., list directory for a large directory) by thefilesystem by map and reduce tasks.Filesystem write ops (WRITE_OPS)The number of write operations (e.g., create, append) by the filesystem by map andreduce tasks.Table 9-4.
Built-in FileInputFormat task countersCounterDescriptionBytes read (BYTES_READ) The number of bytes read by map tasks via the FileInputFormat.Table 9-5. Built-in FileOutputFormat task countersCounterDescriptionBytes written(BYTES_WRITTEN)The number of bytes written by map tasks (for map-only jobs) or reduce tasks via theFileOutputFormat.Job countersJob counters (Table 9-6) are maintained by the application master, so they don’t needto be sent across the network, unlike all other counters, including user-defined ones.They measure job-level statistics, not values that change while a task is running. For250|Chapter 9: MapReduce Featuresexample, TOTAL_LAUNCHED_MAPS counts the number of map tasks that were launchedover the course of a job (including tasks that failed).Table 9-6.
Built-in job countersCounterDescriptionLaunched map tasks(TOTAL_LAUNCHED_MAPS)The number of map tasks that were launched. Includes tasks that werestarted speculatively (see “Speculative Execution” on page 204).Launched reduce tasks(TOTAL_LAUNCHED_REDUCES)The number of reduce tasks that were launched.
Includes tasks that werestarted speculatively.Launched uber tasks(TOTAL_LAUNCHED_UBERTASKS)The number of uber tasks (see “Anatomy of a MapReduce Job Run” onpage 185) that were launched.Maps in uber tasks (NUM_UBER_SUBMAPS)The number of maps in uber tasks.Reduces in uber tasks(NUM_UBER_SUBREDUCES)The number of reduces in uber tasks.Failed map tasks (NUM_FAILED_MAPS)The number of map tasks that failed. See “Task Failure” on page 193 forpotential causes.Failed reduce tasks (NUM_FAILED_REDUCES)The number of reduce tasks that failed.Failed uber tasks (NUM_FAILED_UBERTASKS) The number of uber tasks that failed.Killed map tasks (NUM_KILLED_MAPS)The number of map tasks that were killed.
See “Task Failure” on page 193for potential causes.Killed reduce tasks (NUM_KILLED_REDUCES)The number of reduce tasks that were killed.Data-local map tasks (DATA_LOCAL_MAPS)The number of map tasks that ran on the same node as their input data.Rack-local map tasks (RACK_LOCAL_MAPS)The number of map tasks that ran on a node in the same rack as theirinput data, but were not data-local.Other local map tasks (OTHER_LOCAL_MAPS)The number of map tasks that ran on a node in a different rack to theirinput data. Inter-rack bandwidth is scarce, and Hadoop tries to place maptasks close to their input data, so this count should be low.
See Figure 2-2.Total time in map tasks (MILLIS_MAPS)The total time taken running map tasks, in milliseconds. Includes tasksthat were started speculatively. See also corresponding counters formeasuring core and memory usage (VCORES_MILLIS_MAPS andMB_MILLIS_MAPS).Total time in reduce tasks (MILLIS_REDUCES) The total time taken running reduce tasks, in milliseconds. Includes tasksthat were started speculatively.
See also corresponding counters formeasuring core and memory usage (VCORES_MILLIS_REDUCES andMB_MILLIS_REDUCES).User-Defined Java CountersMapReduce allows user code to define a set of counters, which are then incremented asdesired in the mapper or reducer. Counters are defined by a Java enum, which servesto group related counters. A job may define an arbitrary number of enums, each withan arbitrary number of fields. The name of the enum is the group name, and the enum’sCounters|251fields are the counter names.
Counters are global: the MapReduce framework aggregatesthem across all maps and reduces to produce a grand total at the end of the job.We created some counters in Chapter 6 for counting malformed records in the weatherdataset. The program in Example 9-1 extends that example to count the number ofmissing records and the distribution of temperature quality codes.Example 9-1. Application to run the maximum temperature job, including countingmissing and malformed fields and quality codespublic class MaxTemperatureWithCounters extends Configured implements Tool {enum Temperature {MISSING,MALFORMED}static class MaxTemperatureMapperWithCountersextends Mapper<LongWritable, Text, Text, IntWritable> {private NcdcRecordParser parser = new NcdcRecordParser();@Overrideprotected void map(LongWritable key, Text value, Context context)throws IOException, InterruptedException {parser.parse(value);if (parser.isValidTemperature()) {int airTemperature = parser.getAirTemperature();context.write(new Text(parser.getYear()),new IntWritable(airTemperature));} else if (parser.isMalformedTemperature()) {System.err.println("Ignoring possibly corrupt input: " + value);context.getCounter(Temperature.MALFORMED).increment(1);} else if (parser.isMissingTemperature()) {context.getCounter(Temperature.MISSING).increment(1);}// dynamic countercontext.getCounter("TemperatureQuality", parser.getQuality()).increment(1);}}@Overridepublic int run(String[] args) throws Exception {Job job = JobBuilder.parseInputAndOutput(this, getConf(), args);if (job == null) {return -1;}job.setOutputKeyClass(Text.class);job.setOutputValueClass(IntWritable.class);252|Chapter 9: MapReduce Featuresjob.setMapperClass(MaxTemperatureMapperWithCounters.class);job.setCombinerClass(MaxTemperatureReducer.class);job.setReducerClass(MaxTemperatureReducer.class);return job.waitForCompletion(true) ? 0 : 1;}public static void main(String[] args) throws Exception {int exitCode = ToolRunner.run(new MaxTemperatureWithCounters(), args);System.exit(exitCode);}}The best way to see what this program does is to run it over the complete dataset:% hadoop jar hadoop-examples.jar MaxTemperatureWithCounters \input/ncdc/all output-countersWhen the job has successfully completed, it prints out the counters at the end (this isdone by the job client).
Here are the ones we are interested in:Air Temperature RecordsMalformed=3Missing=66136856TemperatureQuality0=11=9734221732=12460324=107645005=1582918796=400669=66136858Notice that the counters for temperature have been made more readable by using aresource bundle named after the enum (using an underscore as a separator for nestedclasses)—in this case MaxTemperatureWithCounters_Temperature.properties, whichcontains the display name mappings.Dynamic countersThe code makes use of a dynamic counter—one that isn’t defined by a Java enum.
Be‐cause a Java enum’s fields are defined at compile time, you can’t create new counters onthe fly using enums. Here we want to count the distribution of temperature qualitycodes, and though the format specification defines the values that the temperaturequality code can take, it is more convenient to use a dynamic counter to emit the valuesthat it actually takes. The method we use on the Context object takes a group and countername using String names:public Counter getCounter(String groupName, String counterName)Counters|253The two ways of creating and accessing counters—using enums and using strings—areactually equivalent because Hadoop turns enums into strings to send counters over RPC.Enums are slightly easier to work with, provide type safety, and are suitable for mostjobs.
For the odd occasion when you need to create counters dynamically, you can usethe String interface.Retrieving countersIn addition to using the web UI and the command line (using mapred job -counter),you can retrieve counter values using the Java API. You can do this while the job isrunning, although it is more usual to get counters at the end of a job run, when they arestable. Example 9-2 shows a program that calculates the proportion of records that havemissing temperature fields.Example 9-2.