c20-4 (779626)
Текст из файла
20.4 Huffman Coding and Compression of Data903k=ij[k][ip[(c+2) % 10][7 & m++]];}for (j=0;j<=9;j++)Find which appended digit will check properly.if (!ij[k][ip[j][m & 7]]) break;*ch=j+48;Convert to ASCII.return k==0;}McNamara, J.E. 1982, Technical Aspects of Data Communication, 2nd ed. (Bedford, MA: DigitalPress). [1]da Cruz, F. 1987, Kermit, A File Transfer Protocol (Bedford, MA: Digital Press). [2]Morse, G.
1986, Byte, vol. 11, pp. 115–124 (September). [3]LeVan, J. 1987, Byte, vol. 12, pp. 339–341 (November). [4]Sarwate, D.V. 1988, Communications of the ACM, vol. 31, pp. 1008–1013. [5]Griffiths, G., and Stones, G.C. 1987, Communications of the ACM, vol. 30, pp. 617–620. [6]Wagner, N.R., and Putter, P.S. 1989, Communications of the ACM, vol. 32, pp. 106–110. [7]20.4 Huffman Coding and Compression of DataA lossless data compression algorithm takes a string of symbols (typicallyASCII characters or bytes) and translates it reversibly into another string, one thatis on the average of shorter length. The words “on the average” are crucial; itis obvious that no reversible algorithm can make all strings shorter — there justaren’t enough short strings to be in one-to-one correspondence with longer strings.Compression algorithms are possible only when, on the input side, some strings, orsome input symbols, are more common than others.
These can then be encoded infewer bits than rarer input strings or symbols, giving a net average gain.There exist many, quite different, compression techniques, corresponding todifferent ways of detecting and using departures from equiprobability in input strings.In this section and the next we shall consider only variable length codes with definedword inputs. In these, the input is sliced into fixed units, for example ASCIIcharacters, while the corresponding output comes in chunks of variable size. Thesimplest such method is Huffman coding [1], discussed in this section. Anotherexample, arithmetic compression, is discussed in §20.5.At the opposite extreme from defined-word, variable length codes are schemesthat divide up the input into units of variable length (words or phrases of English text,for example) and then transmit these, often with a fixed-length output code.
The mostwidely used code of this type is the Ziv-Lempel code [2]. References [3-6] give theflavor of some other compression techniques, with references to the large literature.The idea behind Huffman coding is simply to use shorter bit patterns for morecommon characters. We can make this idea quantitative by considering the conceptof entropy. Suppose the input alphabet has Nch characters, and that thesePoccur inthe input string with respective probabilities pi , i = 1, .
. . , Nch , so thatpi = 1.Then the fundamental theorem of information theory says that strings consisting ofSample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).CITED REFERENCES AND FURTHER READING:904Chapter 20.Less-Numerical Algorithmsindependently random sequences of these characters (a conservative, but not alwaysrealistic assumption) require, on the average, at leastH=−Xpi log2 pi(20.4.1)Node Stage:12340.421A:0.120.122E:0.420.420.423I:0.094O:0.300.300.305U:0.076789UI:50.16AUI:0.28AUIO:0.58EAUIO: 1.00Here is how it works, proceeding in sequence through Nch stages, representedby the columns of the table.
The first stage starts with Nch nodes, one for eachletter of the alphabet, containing their respective relative frequencies. At each stage,the two smallest probabilities are found, summed to make a new node, and thendropped from the list of active nodes. (A “block” denotes the stage where a node isdropped.) All active nodes (including the new composite) are then carried over tothe next stage (column).
In the table, the names assigned to new nodes (e.g., AUI)are inconsequential. In the example shown, it happens that (after stage 1) the twoSample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited.
To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).bits per character. Here H is the entropy of the probability distribution. Moreover,coding schemes exist which approach the bound arbitrarily closely. For the case ofequiprobable characters, with all pi = 1/Nch , one easily sees that H = log2 Nch ,which is the case of no compression at all. Any other set of pi ’s gives a smallerentropy, allowing some useful compression.Notice that the bound of (20.4.1) would be achieved if we could encode characteri with a Pcode of length Li = − log2 pi bits: Equation (20.4.1) would then be theaveragepi Li . The trouble with such a scheme is that − log2 pi is not generallyan integer.
How can we encode the letter “Q” in 5.32 bits? Huffman coding makesa stab at this by, in effect, approximating all the probabilities pi by integer powersof 1/2, so that all the Li ’s are integral. If all the pi ’s are in fact of this form, then aHuffman code does achieve the entropy bound H.The construction of a Huffman code is best illustrated by example. Imaginea language, Vowellish, with the Nch = 5 character alphabet A, E, I, O, and U,occurring with the respective probabilities 0.12, 0.42, 0.09, 0.30, and 0.07. Then theconstruction of a Huffman code for Vowellish is accomplished in the following table:20.4 Huffman Coding and Compression of Data9059 EAUIO 1.00012 E 0.428AUIO7AUI010.284 O 0.3011 A 0.12 605 U 0.07UI0.1613 I 0.09Figure 20.4.1.
Huffman code for the fictitious language Vowellish, in tree form. A letter (A, E, I,O, or U) is encoded or decoded by traversing the tree from the top down; the code is the sequence of0’s and 1’s on the branches. The value to the right of each node is its probability; to the left, its nodenumber in the accompanying table.smallest nodes are always an original node and a composite one; this need not betrue in general: The two smallest probabilities might be both original nodes, or bothcomposites, or one of each. At the last stage, all nodes will have been collected intoone grand composite of total probability 1.Now, to see the code, you redraw the data in the above table as a tree (Figure20.4.1).
As shown, each node of the tree corresponds to a node (row) in the table,indicated by the integer to its left and probability value to its right. Terminal nodes,so called, are shown as circles; these are single alphabetic characters. The branchesof the tree are labeled 0 and 1. The code for a character is the sequence of zeros andones that lead to it, from the top down. For example, E is simply 0, while U is 1010.Any string of zeros and ones can now be decoded into an alphabetic sequence.Consider, for example, the string 1011111010. Starting at the top of the tree wedescend through 1011 to I, the first character.
Since we have reached a terminalnode, we reset to the top of the tree, next descending through 11 to O. Finally 1010gives U. The string thus decodes to IOU.These ideas are embodied in the following routines. Input to the first routinehufmak is an integer vector of the frequency of occurrence of the nchin ≡ Nchalphabetic characters, i.e., a set of integers proportional to the pi ’s. hufmak, alongwith hufapp, which it calls, performs the construction of the above table, and also thetree of Figure 20.4.1. The routine utilizes a heap structure (see §8.3) for efficiency;for a detailed description, see Sedgewick [7].Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).00.58906Chapter 20.Less-Numerical Algorithms#include "nrutil.h"typedef struct {unsigned long *icod,*ncod,*left,*right,nch,nodemax;} huffcode;hcode->nch=nchin;Initialization.index=lvector(1,(long)(2*hcode->nch-1));up=(long *)lvector(1,(long)(2*hcode->nch-1));Vector that will keep track ofnprob=lvector(1,(long)(2*hcode->nch-1));heap.for (nused=0,j=1;j<=hcode->nch;j++) {nprob[j]=nfreq[j];hcode->icod[j]=hcode->ncod[j]=0;if (nfreq[j]) index[++nused]=j;}for (j=nused;j>=1;j--) hufapp(index,nprob,nused,j);Sort nprob into a heap structure in index.k=hcode->nch;while (nused > 1) {Combine heap nodes, remakingnode=index[1];the heap at each stage.index[1]=index[nused--];hufapp(index,nprob,nused,1);nprob[++k]=nprob[index[1]]+nprob[node];hcode->left[k]=node;Store left and right children of ahcode->right[k]=index[1];node.up[index[1]] = -(long)k;Indicate whether a node is a leftup[node]=index[1]=k;or right child of its parent.hufapp(index,nprob,nused,1);}up[hcode->nodemax=k]=0;for (j=1;j<=hcode->nch;j++) {Make the Huffman code fromif (nprob[j]) {the tree.for (n=0,ibit=0,node=up[j];node;node=up[node],ibit++) {if (node < 0) {n |= setbit[ibit];node = -node;}}hcode->icod[j]=n;hcode->ncod[j]=ibit;}}*nlong=0;for (j=1;j<=hcode->nch;j++) {if (hcode->ncod[j] > *nlong) {*nlong=hcode->ncod[j];Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).void hufmak(unsigned long nfreq[], unsigned long nchin, unsigned long *ilong,unsigned long *nlong, huffcode *hcode)Given the frequency of occurrence table nfreq[1..nchin] of nchin characters, construct theHuffman code in the structure hcode. Returned values ilong and nlong are the characternumber that produced the longest code symbol, and the length of that symbol.
You shouldcheck that nlong is not larger than your machine’s word length.{void hufapp(unsigned long index[], unsigned long nprob[], unsigned long n,unsigned long i);int ibit;long node,*up;unsigned long j,k,*index,n,nused,*nprob;static unsigned long setbit[32]={0x1L,0x2L,0x4L,0x8L,0x10L,0x20L,0x40L,0x80L,0x100L,0x200L,0x400L,0x800L,0x1000L,0x2000L,0x4000L,0x8000L,0x10000L,0x20000L,0x40000L,0x80000L,0x100000L,0x200000L,0x400000L,0x800000L,0x1000000L,0x2000000L,0x4000000L,0x8000000L,0x10000000L,0x20000000L,0x40000000L,0x80000000L};20.4 Huffman Coding and Compression of Data907*ilong=j-1;}}free_lvector(nprob,1,(long)(2*hcode->nch-1));free_lvector((unsigned long *)up,1,(long)(2*hcode->nch-1));free_lvector(index,1,(long)(2*hcode->nch-1));}k=index[i];while (i <= (n>>1)) {if ((j = i << 1) < n && nprob[index[j]] > nprob[index[j+1]]) j++;if (nprob[k] <= nprob[index[j]]) break;index[i]=index[j];i=j;}index[i]=k;}Note that the structure hcode must be defined and allocated in your mainprogram with statements like this:#include "nrutil.h"#define MC 512Maximum anticipated value of nchin in hufmak.#define MQ (2*MC-1)typedef struct {unsigned long *icod,*ncod,*left,*right,nch,nodemax;} huffcode;...huffcode hcode;...hcode.icod=(unsigned long *)lvector(1,MQ);Allocate space within hcode.hcode.ncod=(unsigned long *)lvector(1,MQ);hcode.left=(unsigned long *)lvector(1,MQ);hcode.right=(unsigned long *)lvector(1,MQ);for (j=1;j<=MQ;j++) hcode.icod[j]=hcode.ncod[j]=0;Once the code is constructed, one encodes a string of characters by repeated callsto hufenc, which simply does a table lookup of the code and appends it to theoutput message.#include <stdio.h>#include <stdlib.h>typedef struct {unsigned long *icod,*ncod,*left,*right,nch,nodemax;} huffcode;void hufenc(unsigned long ich, unsigned char **codep, unsigned long *lcode,unsigned long *nb, huffcode *hcode)Huffman encode the single character ich (in the range 0..nch-1) using the code in thestructure hcode, write the result to the character array *codep[1..lcode] starting at bitnb (whose smallest valid value is zero), and increment nb appropriately.
Характеристики
Тип файла PDF
PDF-формат наиболее широко используется для просмотра любого типа файлов на любом устройстве. В него можно сохранить документ, таблицы, презентацию, текст, чертежи, вычисления, графики и всё остальное, что можно показать на экране любого устройства. Именно его лучше всего использовать для печати.
Например, если Вам нужно распечатать чертёж из автокада, Вы сохраните чертёж на флешку, но будет ли автокад в пункте печати? А если будет, то нужная версия с нужными библиотеками? Именно для этого и нужен формат PDF - в нём точно будет показано верно вне зависимости от того, в какой программе создали PDF-файл и есть ли нужная программа для его просмотра.















