Tom White - Hadoop The Definitive Guide_ 4 edition - 2015 (811394), страница 27

Файл №811394 Tom White - Hadoop The Definitive Guide_ 4 edition - 2015 (Tom White - Hadoop The Definitive Guide_ 4 edition - 2015.pdf) 27 страницаTom White - Hadoop The Definitive Guide_ 4 edition - 2015 (811394) страница 272020-08-252020-08-25СтудИзба

Tom White - Hadoop The Definitive Guide_ 4 edition - 2015.pdf

Просмтор этого файла доступен только зарегистрированным пользователям. Но у нас супер быстрая регистрация: достаточно только электронной почты!

Регистрация/авторизация

Текст из файла (страница 27)

When we start using characters that are encoded with more than a single byte,the differences between Text and String become clear. Consider the Unicode charactersshown in Table 5-8.2Table 5-8. Unicode charactersUnicode code pointU+0041U+00DFU+6771U+10400NameLATIN CAPITALLETTER ALATIN SMALLLETTER SHARP SN/A (a unified Hanideograph)DESERET CAPITAL LETTERLONG IUTF-8 code units41c3 9fe6 9d b1f0 90 90 80Java representation\u0041\u00DF\u6771\uD801\uDC00All but the last character in the table, U+10400, can be expressed using a single Javachar. U+10400 is a supplementary character and is represented by two Java chars,known as a surrogate pair.

The tests in Example 5-5 show the differences between Stringand Text when processing a string of the four characters from Table 5-8.Example 5-5. Tests showing the differences between the String and Text classespublic class StringTextComparisonTest {@Testpublic void string() throws UnsupportedEncodingException {String s = "\u0041\u00DF\u6771\uD801\uDC00";assertThat(s.length(), is(5));assertThat(s.getBytes("UTF-8").length, is(10));assertThat(s.indexOf("\u0041"), is(0));assertThat(s.indexOf("\u00DF"), is(1));assertThat(s.indexOf("\u6771"), is(2));assertThat(s.indexOf("\uD801\uDC00"), is(3));assertThat(s.charAt(0),assertThat(s.charAt(1),assertThat(s.charAt(2),assertThat(s.charAt(3),assertThat(s.charAt(4),is('\u0041'));is('\u00DF'));is('\u6771'));is('\uD801'));is('\uDC00'));assertThat(s.codePointAt(0),assertThat(s.codePointAt(1),assertThat(s.codePointAt(2),assertThat(s.codePointAt(3),is(0x0041));is(0x00DF));is(0x6771));is(0x10400));}2.

This example is based on one from Norbert Lindenberg and Masayoshi Okutsu’s “Supplementary Charactersin the Java Platform,” May 2004.116|Chapter 5: Hadoop I/O@Testpublic void text() {Text t = new Text("\u0041\u00DF\u6771\uD801\uDC00");assertThat(t.getLength(), is(10));assertThat(t.find("\u0041"), is(0));assertThat(t.find("\u00DF"), is(1));assertThat(t.find("\u6771"), is(3));assertThat(t.find("\uD801\uDC00"), is(6));assertThat(t.charAt(0),assertThat(t.charAt(1),assertThat(t.charAt(3),assertThat(t.charAt(6),is(0x0041));is(0x00DF));is(0x6771));is(0x10400));}}The test confirms that the length of a String is the number of char code units it contains(five, made up of one from each of the first three characters in the string and a surrogatepair from the last), whereas the length of a Text object is the number of bytes in itsUTF-8 encoding (10 = 1+2+3+4).

Similarly, the indexOf() method in String returnsan index in char code units, and find() for Text returns a byte offset.The charAt() method in String returns the char code unit for the given index, whichin the case of a surrogate pair will not represent a whole Unicode character. The codePointAt() method, indexed by char code unit, is needed to retrieve a single Unicodecharacter represented as an int. In fact, the charAt() method in Text is more like thecodePointAt() method than its namesake in String. The only difference is that it isindexed by byte offset.Iteration. Iterating over the Unicode characters in Text is complicated by the use of byteoffsets for indexing, since you can’t just increment the index.

The idiom for iteration isa little obscure (see Example 5-6): turn the Text object into a java.nio.ByteBuffer,then repeatedly call the bytesToCodePoint() static method on Text with the buffer.This method extracts the next code point as an int and updates the position in thebuffer. The end of the string is detected when bytesToCodePoint() returns –1.Example 5-6. Iterating over the characters in a Text objectpublic class TextIterator {public static void main(String[] args) {Text t = new Text("\u0041\u00DF\u6771\uD801\uDC00");ByteBuffer buf = ByteBuffer.wrap(t.getBytes(), 0, t.getLength());int cp;while (buf.hasRemaining() && (cp = Text.bytesToCodePoint(buf)) != -1) {Serialization|117System.out.println(Integer.toHexString(cp));}}}Running the program prints the code points for the four characters in the string:% hadoop TextIterator41df677110400Mutability.

Another difference from String is that Text is mutable (like all Writableimplementations in Hadoop, except NullWritable, which is a singleton). You can reusea Text instance by calling one of the set() methods on it. For example:Text t = new Text("hadoop");t.set("pig");assertThat(t.getLength(), is(3));assertThat(t.getBytes().length, is(3));In some situations, the byte array returned by the getBytes() meth‐od may be longer than the length returned by getLength():Text t = new Text("hadoop");t.set(new Text("pig"));assertThat(t.getLength(), is(3));assertThat("Byte length not shortened", t.getBytes().length,is(6));This shows why it is imperative that you always call getLength()when calling getBytes(), so you know how much of the byte arrayis valid data.Resorting to String.

Text doesn’t have as rich an API for manipulating strings asjava.lang.String, so in many cases, you need to convert the Text object to a String.This is done in the usual way, using the toString() method:assertThat(new Text("hadoop").toString(), is("hadoop"));BytesWritableBytesWritable is a wrapper for an array of binary data. Its serialized format is a 4-byteinteger field that specifies the number of bytes to follow, followed by the bytes them‐selves. For example, the byte array of length 2 with values 3 and 5 is serialized as a 4byte integer (00000002) followed by the two bytes from the array (03 and 05):118|Chapter 5: Hadoop I/OBytesWritable b = new BytesWritable(new byte[] { 3, 5 });byte[] bytes = serialize(b);assertThat(StringUtils.byteToHexString(bytes), is("000000020305"));BytesWritable is mutable, and its value may be changed by calling its set() method.As with Text, the size of the byte array returned from the getBytes() method forBytesWritable—the capacity—may not reflect the actual size of the data stored in theBytesWritable.

You can determine the size of the BytesWritable by calling getLength(). To demonstrate:b.setCapacity(11);assertThat(b.getLength(), is(2));assertThat(b.getBytes().length, is(11));NullWritableNullWritable is a special type of Writable, as it has a zero-length serialization. No bytesare written to or read from the stream. It is used as a placeholder; for example, in Map‐Reduce, a key or a value can be declared as a NullWritable when you don’t need to usethat position, effectively storing a constant empty value. NullWritable can also be usefulas a key in a SequenceFile when you want to store a list of values, as opposed to keyvalue pairs.

It is an immutable singleton, and the instance can be retrieved by callingNullWritable.get().ObjectWritable and GenericWritableObjectWritable is a general-purpose wrapper for the following: Java primitives,String, enum, Writable, null, or arrays of any of these types. It is used in Hadoop RPCto marshal and unmarshal method arguments and return types.ObjectWritable is useful when a field can be of more than one type.

For example, ifthe values in a SequenceFile have multiple types, you can declare the value type as anObjectWritable and wrap each type in an ObjectWritable. Being a general-purposemechanism, it wastes a fair amount of space because it writes the classname of thewrapped type every time it is serialized.

In cases where the number of types is small andknown ahead of time, this can be improved by having a static array of types and usingthe index into the array as the serialized reference to the type. This is the approach thatGenericWritable takes, and you have to subclass it to specify which types to support.Writable collectionsThe org.apache.hadoop.io package includes six Writable collection types: ArrayWritable,ArrayPrimitiveWritable,TwoDArrayWritable,MapWritable,SortedMapWritable, and EnumSetWritable.ArrayWritable and TwoDArrayWritable are Writable implementations for arrays andtwo-dimensional arrays (array of arrays) of Writable instances. All the elements of anSerialization|119ArrayWritable or a TwoDArrayWritable must be instances of the same class, which isspecified at construction as follows:ArrayWritable writable = new ArrayWritable(Text.class);In contexts where the Writable is defined by type, such as in SequenceFile keys orvalues or as input to MapReduce in general, you need to subclass ArrayWritable (orTwoDArrayWritable, as appropriate) to set the type statically.

For example:public class TextArrayWritable extends ArrayWritable {public TextArrayWritable() {super(Text.class);}}ArrayWritable and TwoDArrayWritable both have get() and set() methods, as wellas a toArray() method, which creates a shallow copy of the array (or 2D array).ArrayPrimitiveWritable is a wrapper for arrays of Java primitives.

The componenttype is detected when you call set(), so there is no need to subclass to set the type.MapWritable is an implementation of java.util.Map<Writable, Writable>, and SortedMapWritable is an implementation of java.util.SortedMap<WritableComparable, Writable>. The type of each key and value field is a part of the serialization formatfor that field. The type is stored as a single byte that acts as an index into an array oftypes. The array is populated with the standard types in the org.apache.hadoop.iopackage, but custom Writable types are accommodated, too, by writing a header thatencodes the type array for nonstandard types. As they are implemented, MapWritableand SortedMapWritable use positive byte values for custom types, so a maximum of127 distinct nonstandard Writable classes can be used in any particular MapWritableor SortedMapWritable instance.

Here’s a demonstration of using a MapWritable withdifferent types for keys and values:MapWritable src = new MapWritable();src.put(new IntWritable(1), new Text("cat"));src.put(new VIntWritable(2), new LongWritable(163));MapWritable dest = new MapWritable();WritableUtils.cloneInto(dest, src);assertThat((Text) dest.get(new IntWritable(1)), is(new Text("cat")));assertThat((LongWritable) dest.get(new VIntWritable(2)),is(new LongWritable(163)));Conspicuous by their absence are Writable collection implementations for sets andlists. A general set can be emulated by using a MapWritable (or a SortedMapWritablefor a sorted set) with NullWritable values. There is also EnumSetWritable for sets ofenum types.

Характеристики

Тип файла

PDF-файл

Размер

9,6 Mb

Материал

Tom White - Hadoop The Definitive Guide_ 4 edition - 2015.pdf

Тип материала

Книга

Предмет

(СМРХиОД) Современные методы распределенного хранения и обработки данных

Высшее учебное заведение

МГУ им. Ломоносова

Список файлов книги

tom-white-hadoop-the-definitive-guide_-4-edition-2015.pdf.rar

Tom White - Hadoop The Definitive Guide_ 4 edition - 2015.pdf

Поделитесь ссылкой:

Ставлю 10/10
Все нравится, очень удобный сайт, помогает в учебе. Кроме этого, можно заработать самому, выставляя готовые учебные материалы на продажу здесь. Рейтинги и отзывы на преподавателей очень помогают сориентироваться в начале нового семестра. Спасибо за такую функцию. Ставлю максимальную оценку.

Лучшая платформа для успешной сдачи сессии
Познакомился со СтудИзбой благодаря своему другу, очень нравится интерфейс, количество доступных файлов, цена, в общем, все прекрасно. Даже сам продаю какие-то свои работы.

Студизба ван лав ❤
Очень офигенный сайт для студентов. Много полезных учебных материалов. Пользуюсь студизбой с октября 2021 года. Серьёзных нареканий нет. Хотелось бы, что бы ввели подписочную модель и сделали материалы дешевле 300 рублей в рамках подписки бесплатными.

Отличный сайт
Лично меня всё устраивает - и покупка, и продажа; и цены, и возможность предпросмотра куска файла, и обилие бесплатных файлов (в подборках по авторам, читай, ВУЗам и факультетам). Есть определённые баги, но всё решаемо, да и администраторы реагируют в течение суток.

Маленький отзыв о большом помощнике!
Студизба спасает в те моменты, когда сроки горят, а работ накопилось достаточно. Довольно удобный сайт с простой навигацией и огромным количеством материалов.

Студ. Изба как крупнейший сборник работ для студентов
Тут дофига бывает всего полезного. Печально, что бывают предметы по которым даже одного бесплатного решения нет, но это скорее вопрос к студентам. В остальном всё здорово.

Спасательный островок
Если уже не успеваешь разобраться или застрял на каком-то задание поможет тебе быстро и недорого решить твою проблему.

Всё и так отлично
Всё очень удобно. Особенно круто, что есть система бонусов и можно выводить остатки денег. Очень много качественных бесплатных файлов.

Отзыв о системе "Студизба"
Отличная платформа для распространения работ, востребованных студентами. Хорошо налаженная и качественная работа сайта, огромная база заданий и аудитория.

Отличный помощник
Отличный сайт с кучей полезных файлов, позволяющий найти много методичек / учебников / отзывов о вузах и преподователях.

Отлично помогает студентам в любой момент для решения трудных и незамедлительных задач
Хотелось бы больше конкретной информации о преподавателях. А так в принципе хороший сайт, всегда им пользуюсь и ни разу не было желания прекратить. Хороший сайт для помощи студентам, удобный и приятный интерфейс. Из недостатков можно выделить только отсутствия небольшого количества файлов.

Спасибо за шикарный сайт
Великолепный сайт на котором студент за не большие деньги может найти помощь с дз, проектами курсовыми, лабораторными, а также узнать отзывы на преподавателей и бесплатно скачать пособия.