c20-4 (779626), страница 2
Текст из файла (страница 2)
This routine is calledSample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).void hufapp(unsigned long index[], unsigned long nprob[], unsigned long n,unsigned long i)Used by hufmak to maintain a heap structure in the array index[1..l].{unsigned long j,k;908Chapter 20.Less-Numerical Algorithmsk=ich+1;Convert character range 0..nch-1 to array index range 1..nch.if (k > hcode->nch || k < 1) nrerror("ich out of range in hufenc.");for (n=hcode->ncod[k]-1;n>=0;n--,++(*nb)) {Loop over the bits in the storednc=(*nb >> 3);Huffman code for ich.if (++nc >= *lcode) {fprintf(stderr,"Reached the end of the ’code’ array.\n");fprintf(stderr,"Attempting to expand its size.\n");*lcode *= 1.5;if ((*codep=(unsigned char *)realloc(*codep,(unsigned)(*lcode*sizeof(unsigned char)))) == NULL) {nrerror("Size expansion failed.");}}l=(*nb) & 7;if (!l) (*codep)[nc]=0;Set appropriate bits in code.if (hcode->icod[k] & setbit[n]) (*codep)[nc] |= setbit[l];}}Decoding a Huffman-encoded message is slightly more complicated.
Thecoding tree must be traversed from the top down, using up a variable number of bits:typedef struct {unsigned long *icod,*ncod,*left,*right,nch,nodemax;} huffcode;void hufdec(unsigned long *ich, unsigned char *code, unsigned long lcode,unsigned long *nb, huffcode *hcode)Starting at bit number nb in the character array code[1..lcode], use the Huffman code storedin the structure hcode to decode a single character (returned as ich in the range 0..nch-1)and increment nb appropriately. Repeated calls, starting with nb = 0 will return successivecharacters in a compressed message. The returned value ich=nch indicates end-of-message.The structure hcode must already have been defined and allocated in your main program, andalso filled by a call to hufmak.{long nc,node;static unsigned char setbit[8]={0x1,0x2,0x4,0x8,0x10,0x20,0x40,0x80};node=hcode->nodemax;for (;;) {Set node to the top of the decoding tree, and loopnc=(*nb >> 3);until a valid character is obtained.if (++nc > lcode) {Ran out of input; with ich=nch indicating end of*ich=hcode->nch;message.return;}node=(code[nc] & setbit[7 & (*nb)++] ?hcode->right[node] : hcode->left[node]);Branch left or right in tree, depending on its value.if (node <= hcode->nch) { If we reach a terminal node, we have a completecharacter and can return.Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).repeatedly to encode consecutive characters in a message, but must be preceded by a singleinitializing call to hufmak, which constructs hcode.{void nrerror(char error_text[]);int l,n;unsigned long k,nc;static unsigned long setbit[32]={0x1L,0x2L,0x4L,0x8L,0x10L,0x20L,0x40L,0x80L,0x100L,0x200L,0x400L,0x800L,0x1000L,0x2000L,0x4000L,0x8000L,0x10000L,0x20000L,0x40000L,0x80000L,0x100000L,0x200000L,0x400000L,0x800000L,0x1000000L,0x2000000L,0x4000000L,0x8000000L,0x10000000L,0x20000000L,0x40000000L,0x80000000L};20.4 Huffman Coding and Compression of Data909*ich=node-1;return;}}}Run-Length EncodingFor the compression of highly correlated bit-streams (for example the black orwhite values along a facsimile scan line), Huffman compression is often combinedwith run-length encoding: Instead of sending each bit, the input stream is convertedto a series of integers indicating how many consecutive bits have the same value.These integers are then Huffman-compressed.
The Group 3 CCITT facsimilestandard functions in this manner, with a fixed, immutable, Huffman code, optimizedfor a set of eight standard documents [8,9] .CITED REFERENCES AND FURTHER READING:Gallager, R.G. 1968, Information Theory and Reliable Communication (New York: Wiley).Hamming, R.W. 1980, Coding and Information Theory (Englewood Cliffs, NJ: Prentice-Hall).Storer, J.A. 1988, Data Compression: Methods and Theory (Rockville, MD: Computer SciencePress).Nelson, M. 1991, The Data Compression Book (Redwood City, CA: M&T Books).Huffman, D.A. 1952, Proceedings of the Institute of Radio Engineers, vol. 40, pp. 1098–1101.
[1]Ziv, J., and Lempel, A. 1978, IEEE Transactions on Information Theory, vol. IT-24, pp. 530–536.[2]Cleary, J.G., and Witten, I.H. 1984, IEEE Transactions on Communications, vol. COM-32,pp. 396–402. [3]Welch, T.A. 1984, Computer, vol. 17, no. 6, pp. 8–19. [4]Bentley, J.L., Sleator, D.D., Tarjan, R.E., and Wei, V.K. 1986, Communications of the ACM,vol. 29, pp. 320–330. [5]Jones, D.W. 1988, Communications of the ACM, vol. 31, pp. 996–1007.
[6]Sedgewick, R. 1988, Algorithms, 2nd ed. (Reading, MA: Addison-Wesley), Chapter 22. [7]Hunter, R., and Robinson, A.H. 1980, Proceedings of the IEEE, vol. 68, pp. 854–867. [8]Marking, M.P. 1990, The C Users’ Journal, vol. 8, no. 6, pp. 45–54. [9]Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited.
To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).For simplicity, hufdec quits when it runs out of code bytes; if your codedmessage is not an integral number of bytes, and if Nch is less than 256, hufdeccan return a spurious final character or two, decoded from the spurious trailingbits in your last code byte. If you have independent knowledge of the number ofcharacters sent, you can readily discard these. Otherwise, you can fix this behaviorby providing a bit, not byte, count, and modifying the routine accordingly. (WhenNch is 256 or larger, hufdec will normally run out of code in the middle of aspurious character, and it will be discarded.)910Chapter 20.Less-Numerical Algorithms20.5 Arithmetic CodingSample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).We saw in the previous section that a perfect (entropy-bounded) coding schemewould use Li = − log2 pi bits to encode character i (in the range 1 ≤ i ≤ Nch ),if pi is its probability of occurrence. Huffman coding gives a way of rounding theLi ’s to close integer values and constructing a code with those lengths.
Arithmeticcoding [1], which we now discuss, actually does manage to encode characters usingnoninteger numbers of bits! It also provides a convenient way to output the resultnot as a stream of bits, but as a stream of symbols in any desired radix. This latterproperty is particularly useful if you want, e.g., to convert data from bytes (radix256) to printable ASCII characters (radix 94), or to case-independent alphanumericsequences containing only A-Z and 0-9 (radix 36).In arithmetic coding, an input message of any length is represented as a realnumber R in the range 0 ≤ R < 1.















