c20-5 (779627)
Текст из файла
910Chapter 20.Less-Numerical Algorithms20.5 Arithmetic CodingSample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use. Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).We saw in the previous section that a perfect (entropy-bounded) coding schemewould use Li = − log2 pi bits to encode character i (in the range 1 ≤ i ≤ Nch ),if pi is its probability of occurrence.
Huffman coding gives a way of rounding theLi ’s to close integer values and constructing a code with those lengths. Arithmeticcoding [1], which we now discuss, actually does manage to encode characters usingnoninteger numbers of bits! It also provides a convenient way to output the resultnot as a stream of bits, but as a stream of symbols in any desired radix. This latterproperty is particularly useful if you want, e.g., to convert data from bytes (radix256) to printable ASCII characters (radix 94), or to case-independent alphanumericsequences containing only A-Z and 0-9 (radix 36).In arithmetic coding, an input message of any length is represented as a realnumber R in the range 0 ≤ R < 1.
The longer the message, the more precisionrequired of R. This is best illustrated by an example, so let us return to the fictitiouslanguage, Vowellish, of the previous section. Recall that Vowellish has a 5 characteralphabet (A, E, I, O, U), with occurrence probabilities 0.12, 0.42, 0.09, 0.30, and0.07, respectively. Figure 20.5.1 shows how a message beginning “IOU” is encoded:The interval [0, 1) is divided into segments corresponding to the 5 alphabeticalcharacters; the length of a segment is the corresponding pi . We see that the firstmessage character, “I”, narrows the range of R to 0.37 ≤ R < 0.46.
This interval isnow subdivided into five subintervals, again with lengths proportional to the pi ’s. Thesecond message character, “O”, narrows the range of R to 0.3763 ≤ R < 0.4033.The “U” character further narrows the range to 0.37630 ≤ R < 0.37819. Any valueof R in this range can be sent as encoding “IOU”.
In particular, the binary fraction.011000001 is in this range, so “IOU” can be sent in 9 bits. (Huffman coding took10 bits for this example, see §20.4.)Of course there is the problem of knowing when to stop decoding. Thefraction .011000001 represents not simply “IOU,” but “IOU. . . ,” where the ellipsesrepresent an infinite string of successor characters. To resolve this ambiguity,arithmetic coding generally assumes the existence of a special Nch + 1th character,EOM (end of message), which occurs only once at the end of the input. SinceEOM has a low probability of occurrence, it gets allocated only a very tiny pieceof the number line.In the above example, we gave R as a binary fraction. We could just as wellhave output it in any other radix, e.g., base 94 or base 36, whatever is convenientfor the anticipated storage or communication channel.You might wonder how one deals with the seemingly incredible precisionrequired of R for a long message.
The answer is that R is never actually representedall at once. At any give stage we have upper and lower bounds for R representedas a finite number of digits in the output radix. As digits of the upper and lowerbounds become identical, we can left-shift them away and bring in new digits at thelow-significance end. The routines below have a parameter NWK for the number ofworking digits to keep around. This must be large enough to make the chance ofan accidental degeneracy vanishingly small. (The routines signal if a degeneracyever occurs.) Since the process of discarding old digits and bringing in new ones isperformed identically on encoding and decoding, everything stays synchronized.91120.5 Arithmetic Coding0.40330.461.0A0.9A0.37819AA0.37800.450.4000.8E0.43E0.395E0.3776E0.37740.60.420.40.37720.3900.5I0.41I0.400.3850.3O0.20.39IO0.3770I0.3768OO0.37660.3800.380.1U0.00.3764U0.37UU0.37630.37630Figure 20.5.1.Arithmetic coding of the message “IOU...” in the fictitious language Vowellish.Successive characters give successively finer subdivisions of the initial interval between 0 and 1.
Thefinal value can be output as the digits of a fraction in any desired radix. Note how the subinterval allocatedto a character is proportional to its probability of occurrence.The routine arcmak constructs the cumulative frequency distribution table usedto partition the interval at each stage. In the principal routine arcode, when aninterval of size jdif is to be partitioned in the proportions of some n to some ntot,say, then we must compute (n*jdif)/ntot. With integer arithmetic, the numeratoris likely to overflow; and, unfortunately, an expression like jdif/(ntot/n) is notequivalent. In the implementation below, we resort to double precision floatingarithmetic for this calculation.
Not only is this inefficient, but different roundofferrors can (albeit very rarely) make different machines encode differently, though anyone type of machine will decode exactly what it encoded, since identical roundofferrors occur in the two processes. For serious use, one needs to replace this floatingcalculation with an integer computation in a double register (not available to theC programmer).The internally set variable minint, which is the minimum allowed numberof discrete steps between the upper and lower bounds, determines when new lowsignificance digits are added.
minint must be large enough to provide resolution ofall the input characters. That is, we must have pi × minint > 1 for all i. A valueof 100Nch, or 1.1/ min pi , whichever is larger, is generally adequate. However, forsafety, the routine below takes minint to be as large as possible, with the productminint*nradd just smaller than overflow. This results in some time inefficiency,and in a few unnecessary characters being output at the end of a message.
You canSample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).0.70.37780.44912Chapter 20.Less-Numerical Algorithms#include "nrutil.h"#include <limits.h>ANSI header file containing integer ranges.#define MC 512#ifdef ULONG_MAXMaximum value of unsigned long.#define MAXINT (ULONG_MAX >> 1)#else#define MAXINT 2147483647#endifHere MC is the largest anticipated value of nchh; MAXINT is a large positive integer that doesnot overflow.typedef struct {unsigned long *ilob,*iupb,*ncumfq,jdif,nc,minint,nch,ncum,nrad;} arithcode;void arcmak(unsigned long nfreq[], unsigned long nchh, unsigned long nradd,arithcode *acode)Given a table nfreq[1..nchh] of the frequency of occurrence of nchh symbols, and givena desired output radix nradd, initialize the cumulative frequency table and other variables forarithmetic compression in the structure acode.{unsigned long j;if (nchh > MC) nrerror("input radix may not exceed MC in arcmak.");if (nradd > 256) nrerror("output radix may not exceed 256 in arcmak.");acode->minint=MAXINT/nradd;acode->nch=nchh;acode->nrad=nradd;acode->ncumfq[1]=0;for (j=2;j<=acode->nch+1;j++)acode->ncumfq[j]=acode->ncumfq[j-1]+IMAX(nfreq[j-1],1);acode->ncum=acode->ncumfq[acode->nch+2]=acode->ncumfq[acode->nch+1]+1;}The structure acode must be defined and allocated in your main program withstatements like this:#include "nrutil.h"#define MC 512Maximum anticipated value of nchh in arcmak.#define NWK 20Keep this value the same as in arcode, below.typedef struct {unsigned long *ilob,*iupb,*ncumfq,jdif,nc,minint,nch,ncum,nrad;} arithcode;...arithcode acode;...acode.ilob=(unsigned long *)lvector(1,NWK);Allocate space within acode.acode.iupb=(unsigned long *)lvector(1,NWK);acode.ncumfq=(unsigned long *)lvector(1,MC+2);Sample page from NUMERICAL RECIPES IN C: THE ART OF SCIENTIFIC COMPUTING (ISBN 0-521-43108-5)Copyright (C) 1988-1992 by Cambridge University Press.Programs Copyright (C) 1988-1992 by Numerical Recipes Software.Permission is granted for internet users to make one paper copy for their own personal use.
Further reproduction, or any copying of machinereadable files (including this one) to any servercomputer, is strictly prohibited. To order Numerical Recipes books,diskettes, or CDROMsvisit website http://www.nr.com or call 1-800-872-7423 (North America only),or send email to trade@cup.cam.ac.uk (outside North America).decrease minint if you want to live closer to the edge.A final safety feature in arcmak is its refusal to believe zero values in the tablenfreq; a 0 is treated as if it were a 1. If this were not done, the occurrence in amessage of a single character whose nfreq entry is zero would result in scramblingthe entire rest of the message.
If you want to live dangerously, with a very slightlymore efficient coding, you can delete the IMAX( ,1) operation.20.5 Arithmetic Coding913Individual characters in a message are coded or decoded by the routine arcode,which in turn uses the utility arcsum.typedef struct {unsigned long *ilob,*iupb,*ncumfq,jdif,nc,minint,nch,ncum,nrad;} arithcode;void arcode(unsigned long *ich, unsigned char **codep, unsigned long *lcode,unsigned long *lcd, int isign, arithcode *acode)Compress (isign = 1) or decompress (isign = −1) the single character ich into or out ofthe character array *codep[1..lcode], starting with byte *codep[lcd] and (if necessary)incrementing lcd so that, on return, lcd points to the first unused byte in *codep.
Характеристики
Тип файла PDF
PDF-формат наиболее широко используется для просмотра любого типа файлов на любом устройстве. В него можно сохранить документ, таблицы, презентацию, текст, чертежи, вычисления, графики и всё остальное, что можно показать на экране любого устройства. Именно его лучше всего использовать для печати.
Например, если Вам нужно распечатать чертёж из автокада, Вы сохраните чертёж на флешку, но будет ли автокад в пункте печати? А если будет, то нужная версия с нужными библиотеками? Именно для этого и нужен формат PDF - в нём точно будет показано верно вне зависимости от того, в какой программе создали PDF-файл и есть ли нужная программа для его просмотра.















