shannon1949 (776132), страница 2
Текст из файла (страница 2)
How secure is a system against cryptanalysis when the enemy hasunlimited time and manpower available for the analysis of intercepted cryptograms? The problem is closely related to questions of communication inthe presence of noise, and the concepts of entropy and equivocation developed for the communication problem find a direct application in this part ofcryptography.“Perfect Secrecy” is defined by requiring of a system that after a cryptogram is intercepted by the enemy the a posteriori probabilities of this cryptogram representing various messages be identically the same as the a priori probabilities of the same messages before the interception.
It is shownthat perfect secrecy is possible but requires, if the number of messages is finite, the same number of possible keys. If the message is thought of as beingconstantly generated at a given “rate” R (to be defined later), key must begenerated at the same or a greater rate.If a secrecy system with a finite key is used, and N letters of cryptogramintercepted, there will be, for the enemy, a certain set of messages with certain probabilities that this cryptogram could represent. As N increases thefield usually narrows down until eventually there is a unique “solution” tothe cryptogram; one message with probability essentially unity while all others are practically zero.
A quantity H(N ) is defined, called the equivocation,which measures in a statistical way how near the average cryptogram of Nletters is to a unique solution; that is, how uncertain the enemy is of the original message after intercepting a cryptogram of N letters. Various propertiesof the equivocation are deduced—for example, the equivocation of the keynever increases with increasing N . This equivocation is a theoretical secrecy659index—theoretical in that it allows the enemy unlimited time to analyse thecryptogram.The function H(N ) for a certain idealized type of cipher called the random cipher is determined. With certain modifications this function can beapplied to many cases of practical interest.
This gives a way of calculatingapproximately how much intercepted material is required to obtain a solutionto a secrecy system. It appears from this analysis that with ordinary languagesand the usual types of ciphers (not codes) this “unicity distance” is approximately H(K). Here H(K) is a number measuring the “size” of the key space.DIf all keys are a priori equally likely H(K) is the logarithm of the number ofpossible keys. D is the redundancy of the language and measures the amountof “statistical constraint” imposed by the language.
In simple substitutionwith random key H(K) is log 10 26! or about 20 and D (in decimal digits perletter) is about .7 for English. Thus unicity occurs at about 30 letters.It is possible to construct secrecy systems with a finite key for certain“languages” in which the equivocation does not approach zero as N →∞. Inthis case, no matter how much material is intercepted, the enemy still doesnot obtain a unique solution to the cipher but is left with many alternatives, allof reasonable probability. Such systems we call ideal systems.
It is possiblein any language to approximate such behavior—i.e., to make the approachto zero of H(N ) recede out to arbitrarily large N . However, such systemshave a number of drawbacks, such as complexity and sensitivity to errors intransmission of the cryptogram.The third part of the paper is concerned with “practical secrecy”.
Twosystems with the same key size may both be uniquely solvable when N lettershave been intercepted, but differ greatly in the amount of labor required toeffect this solution. An analysis of the basic weaknesses of secrecy systemsis made. This leads to methods for constructing systems which will require alarge amount of work to solve. Finally, a certain incompatibility among thevarious desirable qualities of secrecy systems is discussed.PART IMATHEMATICAL STRUCTURE OF SECRECY SYSTEMS2S ECRECY S YSTEMSAs a first step in the mathematical analysis of cryptography, it is necessary toidealize the situation suitably, and to define in a mathematically acceptableway what we shall mean by a secrecy system.
A “schematic” diagram of ageneral secrecy system is shown in Fig. 1. At the transmitting end there are660two information sources—a message source and a key source. The key sourceproduces a particular key from among those which are possible in the system.This key is transmitted by some means, supposedly not interceptible, for example by messenger, to the receiving end. The message source produces amessage (the “clear”) which is enciphered and the resulting cryptogram sentto the receiving end by a possibly interceptible means, for example radio. Atthe receiving end the cryptogram and key are combined in the decipherer torecover the message.ENEMYCRYPTANALYSTEMESSAGESOURCEMESSAGE ENCIPHERER CRYPTOGRAMTKMEKEYKEDECIPHERER MESSAGET−1MKKEY KKEYSOURCEFig.
1. Schematic of a general secrecy systemEvidently the encipherer performs a functional operation. If M is the message, K the key, and E the enciphered message, or cryptogram, we haveE = f (M, K)that is E is function of M and K. It is preferable to think of this, however, notas a function of two variables but as a (one parameter) family of operationsor transformations, and to write itE = Ti M.The transformation Ti applied to message M produces cryptogram E.
Theindex i corresponds to the particular key being used.We will assume, in general, that there are only a finite number of possiblekeys, and that each has an associated probability pi . Thus the key sourceis represented by a statistical process or device which chooses one fromthe set of transformations T1 , T2 , · · ·, Tm with the respective probabilitiesp1 , p2 , · · ·, pm . Similarly we will generally assume a finite number of possiblemessages M1 , M2 , · · ·, Mn with associate a priori probabilities q1 , q2 , · · ·, qn .The possible messages, for example, might be the possible sequences of English letters all of length N , and the associated probabilities are then therelative frequencies of occurrence of these sequences in normal English text.661At the receiving end it must be possible to recover M , knowing E andK. Thus the transformations Ti in the family must have unique inverses Ti−1such that Ti Ti−1 = I, the identity transformation.
Thus:M = Ti−1 E.At any rate this inverse must exist uniquely for every E which can be obtained from an M with key i. Hence we arrive at the definition: A secrecysystem is a family of uniquely reversible transformations Ti of a set of possible messages into a set of cryptograms, the transformation Ti having anassociated probability pi . Conversely any set of entities of this type will becalled a “secrecy system”. The set of possible messages will be called, forconvenience, the “message space” and the set of possible cryptograms the“cryptogram space”.Two secrecy systems will be the same if they consist of the same set oftransformations Ti , with the same messages and cryptogram space (range anddomain) and the same probabilities for the keys.A secrecy system can be visualized mechanically as a machine with oneor more controls on it.
A sequence of letters, the message, is fed into the input of the machine and a second series emerges at the output. The particularsetting of the controls corresponds to the particular key being used. Some statistical method must be prescribed for choosing the key from all the possibleones.To make the problem mathematically tractable we shall assume that theenemy knows the system being used. That is, he knows the family of transformations Ti , and the probabilities of choosing various keys.
It might be objected that this assumption is unrealistic, in that the cryptanalyst often doesnot know what system was used or the probabilities in question. There aretwo answers to this objection:1. The restriction is much weaker than appears at first, due to our broaddefinition of what constitutes a secrecy system. Suppose a cryptographerintercepts a message and does not know whether a substitution transposition, or Vigenère type cipher was used. He can consider the message asbeing enciphered by a system in which part of the key is the specificationof which of these types was used, the next part being the particular key forthat type. These three different possibilities are assigned probabilities according to his best estimates of the a priori probabilities of the enciphererusing the respective types of cipher.2.