shannon1949 (776132), страница 9
Текст из файла (страница 9)
e−λ λmp q=m!mwhere λ =Sk.THence∞Xλm. 1HE (K) = e−λm log m.λ2 m!If we replace m by m + 1, we obtain:∞mXλ.log (m + 1).HE (K) = e−λ1 m!This may be used in the region where λ is near unity. For λ1, the onlyimportant term in the series is that for m = 1; omitting the others we have:.HE (K) = e−λ λ log 2.= λ log 2.= 2−N D k log 2.To summarize: HE (K), considered as a function of N , the number ofintercepted letters, starts off at H(K) when N = 0.
It decreases linearly witha slope −D out to the neighborhood of N = H(K). After a short transitionDregion, HE (K) follows an exponential with “half life” distance D1 if D is692measured in bits per letter. This behavior is shown in Fig. 7, together with theapproximating curves.By a similar argument the equivocation of message can be calculated. ItisHE (M ) = R0 N for R0 N HE (K)HE (M ) = HE (K) for R0 N HE (K)HE (M ) = HE (K) − ϕ(N ) for R0 N ∼HE (K)where ϕ(N ) is the function shown in Fig. 7 with N scale reduced by factorof RD0 . Thus, HE (M ) rises linearly with slope R0 , until it nearly intersectsFig.
7. Equivocation for random cipherthe HE (K) line. After a rounded transition it follows the HE (K) curve down.It will be seen from Fig. 7 that the equivocation curves approach zerorather sharply. Thus we may, with but little ambiguity, speak of a point atwhich the solution becomes unique. This number of letters will be called theunicity distance. For the random cipher it is approximately H(K).D15A PPLICATION TO S TANDARD C IPHERSMost of the standard ciphers involve rather complicated enciphering anddeciphering operations. Furthermore, the statistical structure of natural languages is extremely involved.
It is therefore reasonable to assume that theformulas derived for the random cipher may be applied in such cases. It isnecessary, however, to apply certain corrections in some cases. The mainpoints to be observed are the following:6931. We assumed for the random cipher that the possible decipherments of acryptogram are a random selection from the possible messages. While notstrictly true in ordinary systems, this becomes more nearly the case as thecomplexity of the enciphering operations and of the language structureincreases. With a transposition cipher it is clear that letter frequencies arepreserved under decipherment operations. This means that the possibledecipherments are chosen from a more limited group, not the entire message space, and the formula should be changed. In place of R 0 one usesR1 the entropy rate for a language with independent letters but with theregular letter frequencies.
In some other cases a definite tendency towardreturning the decipherments to high probability messages can be seen. Ifthere is no clear tendency of this sort, and the system is fairly complicated,then it is reasonable to use the random cipher analysis.2. In many cases the complete key is not used in enciphering short messages. For example, in a simple substitution, only fairly long messageswill contain all letters of the alphabet and thus involve the complete key.Obviously the random assumption does not hold for small N in such acase, since all the keys which differ only in the letters not yet appearingin the cryptogram lead back to the same message and are not randomlydistributed.
This error is easily corrected to a good approximation by theuse of a “key appearance characteristic”. One uses, at a particular N , theeffective amount of key that may be expected with that length of cryptogram. For most ciphers, this is easily estimated.3.
There are certain “end effects” due to the definite starting of the messagewhich produce a discrepancy from the random characteristics. If we takea random starting point in English text, the first letter (when we do notobserve the preceding letters) has a possibility of being any letter with theordinary letter probabilities. The next letter is more completely specifiedsince we then have digram frequencies. This decrease in choice valuecontinues for some time. The effect of this on the curve is that the straightline part is displaced and approached by a curve depending on how muchthe statistical structure of the language is spread out over adjacent letters.As a first approximation the curve can be corrected by shifting the lineover to the half redundancy point—i.e., the number of letters where thelanguage redundancy is half its final value.If account is taken of these three effects, reasonable estimates of theequivocation characteristic and unicity point can be made.
The calculationcan be done graphically as indicated in Fig. 8. One draws the key appearance characteristic and the total redundancy curve DN (which is usually sufficiently well represented by the line N D∞ ). The difference between theseout to the neighborhood of their intersection is HE (M ). With a simple sub694stitution cipher applied to English, this calculation gave the curves shown inFig. 9. The key appearance characteristic in this case was estimated by counting the number of different letters appearing in typical English passages of Nletters. In so far as experimental data on simple substitution could be found,they agree very well with the curves of Fig.
9, considering the various idealizations and approximations which have been made. For example, the unicitypoint, at about 27 letters, can be shown experimentally to lie between theFig. 8. Graphical calculation of equivocationlimits 20 and 30. With 30 letters there is nearly always a unique solution toa cryptogram of this type and with 20 it is usually easy to find a number ofsolutions.With transposition of period d (random key), H(K) = log d!, or aboutd log de (using a Stirling approximation for d!). If we take .6 decimal digits perletter as the appropriate redundancy, remembering the preservation of letterfrequencies, we obtain about 1.7d log de as the unicity distance.
This alsochecks fairly well experimentally. Note that in this case HE (M ) is definedonly for integral multiples of d.With the Vigenère the unicity point will occur at about 2d letters, and thistoo is about right. The Vigenère characteristic with the same key size as695Fig. 9. Equivocation for simple substitution on English696Fig. 10. Equivocation for Vigenère on English697simple substitution will be approximately as show in Fig. 10. The Vigenère,Playfair and Fractional cases are more likely to follow the theoretical formulas for random ciphers than simple substitution and transposition.
The reasonfor this is that they are more complex and give better mixing characteristicsto the messages on which they operate.The mixed alphabet Vigenère (each of d alphabets mixed independentlyand used sequentially) has a key size,H(K) = d log 26! = 26.3dand its unicity point should be at about 53d letters.These conclusions can also be put to a rough experimental test with theCaesar type cipher. In the particular cryptogram analyzed in Table 1, section11, the function HE (K, N ) has been calculated and is given below, togetherwith the values for a random cipher.N012345H(observed)1.411.24.97.60.280H(calculated)1.411.25.98.54.15.03The agreement is seen to be quite good, especially when we rememberthat the observed H should actually be the average of many different cryptograms, and that H for the larger values of N is only roughly estimated.It appears then that the random cipher analysis can be used to estimateequivocation characteristics and the unicity distance for the ordinary types ofciphers.16VALIDITY OF A C RYPTOGRAM S OLUTIONThe equivocation formulas are relevant to questions which sometimes arisein cryptographic work regarding the validity of an alleged solution to a cryptogram.
In the history of cryptography there have been many cryptograms,or possible cryptograms, where clever analysts have found a “solution”. Itinvolved, however, such a complex process, or the material was so meagerthat the question arose as to whether the cryptanalyst had “read a solution”into the cryptogram. See, for example, the Bacon-Shakespeare ciphers andthe “Roger Bacon” manuscript.11In general we may say that if a proposed system and key solves a cryptogram for a length of material considerably greater than the unicity distancethe solution is trustworthy. If the material is of the same order of shorter thanthe unicity distance the solution is highly suspicious.This effect of redundancy in gradually producing a unique solution to acipher can be thought of in another way which is helpful. The redundancy isessentially a series of conditions on the letters of the message, which insure11See Fletcher Pratt, loc.
cit.698that it be statistically reasonable. These consistency conditions produce corresponding consistency conditions in the cryptogram. The key gives a certainamount of freedom to the cryptogram but, as more and more letters are intercepted, the consistency conditions use up the freedom allowed by the key.Eventually there is only one message and key which satisfies all the conditions and we have a unique solution.
In the random cipher the consistencyconditions are, in a sense “orthogonal” to the “grain of the key” and havetheir full effect in eliminating messages and keys as rapidly as possible. Thisis the usual case. However, by proper design it is possible to “line up” theredundancy of the language with the “grain of the key” in such a way thatthe consistency conditions are automatically satisfied and HE (K) does notapproach zero. These “ideal” systems, which will be considered in the nextsection, are of such a nature that the transformations Ti all induce the sameprobabilities in the E space.17I DEAL S ECRECY S YSTEMSWe have seen that perfect secrecy requires an infinite amount of key if weallow messages of unlimited length.