Лекция 7. HDFS и основы Hadoop Java API (1185415), страница 2
Текст из файла (страница 2)
I stood upon thehearth-rug and picked up the stickwhich our visitor had left behind himthe night before. It was a fine, thickpiece of wood, bulbous-headed, of thesort which is known as a “Penanglawyer.” Just under the head was abroad silver band nearly an inchacross. “To James Mortimer, M.R.C.S., from his friends of the C.C.H.,” wasengraved upon it, with the date “1884.”It was just such a stick as the oldfashioned family practitioner used tocarry — dignified, solid, andreassuring.“Well, Watson, what do you make ofit?”Holmes was sitting with his back tome, and I had given him no sign of myoccupation.Mappersplit #3split #2split #1Число Mappers = Число сплитовИ желательно, чтобы одному Mapper соотвествовал 1 блокРазмер сплита в случае FileInputFormatНазвание свойстваТипПо умолчаниюОписаниеmapreduce.input.fileinputformat.split.minsizeint1Минимальный размерсплита в байтахmapreduce.input.fileinputformat.split.maxsizelongLong.MAX_VALUEМаксимальный размерсплита в байтахdfs.blocksizelong128 МБ(134217728)Размер блока HDFSmax(minSize, min(maxSize, blockSize))minSizemaxSizeblockSizesplitSize1Long.MAX_VALUE128МБ128МБ1Long.MAX_VALUE256МБ256МБ256МБLong.MAX_VALUE128МБ256МБ (2 блока!)164МБ128МБ64МБСоответствие сплитов и блоковВ 1й сплит войдут 1й и 2й блокиА сколько Reducers?(mapred.reduce.tasks)PartitioningPartitioningMaps output1ca1ca1ccc1ca1aa11bc1Reducercaaa1bd1bcreducereduce1ccc1reduceaaca1bc11ca1reduce1reduceaabd1bc11ca1reducereduce1bcaaReducerbdreduce1Reducer1ReducerbcaaReducer11Reduceraaaareducereducebc1bd1reduceccc1reducereducebc1ca1reduceccc1reducePartitionerpackage org.apache.hadoop.mapreducepublic abstract class Partitioner<K,V> {public abstract int getPartition(K key, V value, int numPartitions);}job.setPartitionerClass(HashPartitioner.class);public class HashPartitioner<K, V>extends Partitioner<K, V> {public int getPartition(K key, V value, int numReduceTasks) {return (key.hashCode() & Integer.MAX_VALUE) % numReduceTasks;}}Зачем может понадобится custom Partitioner?Домашнее задание#SYMBOL,SYSTEM,MOMENT,ID_DEAL,PRICE_DEAL,VOLUME,OPEN_POS,DIRECTIONSVH1,F,20110111100000080,255223067,30.46000,1,8714,SSVH1,F,20110111100000080,255223068,30.38000,1,8714,SSVH1,F,20110111100000080,255223069,30.32000,1,8714,SSVH1,F,20110111100000080,255223070,30.28000,2,8714,SSVH1,F,20110111100000080,255223071,30.25000,1,8714,SSVH1,F,20110111100000080,255223072,30.05000,1,8714,SSVH1,F,20110111100000080,255223073,30.05000,3,8714,SRIH1,F,20110111100000097,255223074,177885.00000,1,291758,BRIH1,F,20110111100000097,255223075,177935.00000,2,291758,BRIH1,F,20110111100000097,255223076,177980.00000,10,291758,BRIH1,F,20110111100000097,255223077,177995.00000,1,291758,BRIH1,F,20110111100000097,255223078,178100.00000,2,291758,BRIH1,F,20110111100000097,255223079,178200.00000,1,291758,BRIH1,F,20110111100000097,255223080,178205.00000,1,291758,Bwasb://financedata@bigdatamsu.blob.core.windows.net/Ссылкиhttp://blogs.msdn.com/b/cindygross/archive/2015/02/04/understanding-wasb-andhadoop-storage-in-azure.aspx “Про WASB”.