shannon1949 (776132), страница 7
Текст из файла (страница 7)
The key required for perfect secrecydepends then on the total number of possible messages.One would expect that, if the message space has fixed known statistics,so that it has a definite mean rate R of generating information, in the senseof MTC, then the amount of key needed could be reduced on the average injust this ratio RRM , and this is indeed true. In fact the message can be passedthrough a transducer which eliminates the redundancy and reduces the expected length in just this ratio, and then a Vernam system may be applied tothe result. Evidently the amount of key used per letter of message is statistically reduced by a factor RRM and in this case the key source and informationsource are just matched—a bit of key completely conceals a bit of message682information.
It is easily shown also, by the methods used in MTC, that this isthe best that can be done.Perfect secrecy systems have a place in the practical picture—they may beused either where the greatest importance is attached to complete secrecy—e.g., correspondence between the highest levels of command, or in caseswhere the number of possible messages is small. Thus, to take an extremeexample, if only two messages “yes” or “no” were anticipated, a perfect system would be in order, with perhaps the transformation table:KMyesnoA B0 11 0The disadvantage of perfect systems for large correspondence systemsis, of course, the equivalent amount of key that must be sent.
In succeedingsections we consider what can be achieved with smaller key size, in particularwith finite keys.11E QUIVOCATIONLet us suppose that a simple substitution cipher has been used on English textand that we intercept a certain amount, N letters, of the enciphered text. ForN fairly large, more than say 50 letters, there is nearly always a unique solution to the cipher; i.e., a single good English sequence which transforms intothe intercepted material by a simple substitution.
With a smaller N , however,the chance of more than one solution is greater: with N = 15 there will generally be quite a number of possible fragments of text that would fit, whilewith N = 8 a good fraction (of the order of 18 ) of all reasonable Englishsequences of that length are possible, since there is seldom more than onerepeated letter in the 8. With N = 1 any letter is clearly possible and hasthe same a posteriori probability as its a priori probability.
For one letter thesystem is perfect.This happens generally with solvable ciphers. Before any material is intercepted we can imagine the a priori probabilities attached to the variouspossible messages, and also to the various keys. As material is intercepted,the cryptanalyst calculates the a posteriori probabilities; and as N increasesthe probabilities of certain messages increase, and, of most, decrease, untilfinally only one is left, which has a probability nearly one, while the totalprobability of all others is nearly zero.This calculation can actually be carried out for very simple systems.
Table 1 shows the a posteriori probabilities for a Caesar type cipher appliedto English text, with the key chosen at random from the 26 possibilities. Toenable the use of standard letter, digram and trigram frequency tables, the683text has been started at a random point (by opening a book and putting a pencil down at random on the page). The message selected in this way begins“creases to ...” starting inside the word increases. If the message were knownto start a sentence a different set of probabilities must be used correspondingto the frequencies of letters, digrams, etc., at the beginning of sentences.Table 1.
A Posteriori Probabilities for a Caesar Type CryptogramDeciphermentsC R E A SD S F B TE T G C UF U H D VG V I E WH W J F XI X K G YJ Y L H ZK Z M I AL A N J BM B O K CN C P L DO D Q M EP E R N FQ F S O GR G T P HS H U Q IT I V R JU J W S KV K X T LW L Y U MX M Z V NY N A W OZ O B X PA P C Y QB Q D Z RH(decimal digits)N =1.028.038.131.029.020.053.063.001.004.034.025.071.080.020.001.068.061.105.025.009.015.002.020.001.082.0141.2425N =2.0377.0314.0881.0189N =3.1111N =4.3673N =51.0063.0126.1321.2500.0222.1195.0377.0818.4389.0126.0881.2830.0056.1667.6327.0056.0503.9686.6034.2850The Caesar with random key is a pure cipher and the particular key chosendoes not affect the a posteriori probabilities.
To determine these we needmerely list the possible decipherments by all keys and calculate their a prioriprobabilities. The a posteriori probabilities are these divided by their sum.These possible decipherments are found by the standard process of “runningdown the alphabet” from the message and are listed at the left. These formthe residue class for the message.
For one intercepted letter the a posterioriprobabilities are equal to the a priori probabilities for letters10 and are shownin the column headed N = 1. For two intercepted letters the probabilities arethose for digrams adjusted to sum to unity and these are shown in the columnN = 2.10The probabilities for this table were taken from frequency tables given by Fletcher Pratt in a book“Secret and Urgent” published by Blue Ribbon Books, New York, 1939. Although not complete,they are sufficient for present purposes.684Trigram frequencies have also been tabulated and these are shown in thecolumn N = 3.
For four- and five-letter sequences probabilities were obtained by multiplication from trigram frequencies since, roughly,p(ijkl) = p(ijk)pjk (l).Note that at three letters the field has narrowed down to four messages offairly high probability, the others being small in comparison. At four thereare two possibilities and at five just one, the correct decipherment.In principle this could be carried out with any system but, unless the keyis very small, the number of possibilities is so large that the work involvedprohibits the actual calculation.This set of a posteriori probabilities describes how the cryptanalyst’sknowledge of the message and key gradually becomes more precise as enciphered material is obtained.
This description, however, is much too involvedand difficult to obtain for our purposes. What is desired is a simplified description of this approach to uniqueness of the possible solutions.A similar situation arises in communication theory when a transmittedsignal is perturbed by noise.
It is necessary to set up a suitable measure ofthe uncertainty of what was actually transmitted knowing only the perturbedversion given by the received signal. In MTC it was shown that a naturalmathematical measure of this uncertainty is the conditional entropy of thetransmitted signal when the received signal is known. This conditional entropy was called, for convenience, the equivocation.From the point of view of the cryptanalyst, a secrecy system is almostidentical with a noisy communication system. The message (transmitted signal) is operated on by a statistical element, the enciphering system, with itsstatistically chosen key.
The result of this operation is the cryptogram (analogous to the perturbed signal) which is available for analysis. The chief differences in the two cases are: first, that the operation of the enciphering transformation is generally of a more complex nature than the perturbing noise ina channel; and, second, the key for a secrecy system is usually chosen from afinite set of possibilities while the noise in a channel is more often continuallyintroduced, in effect chosen from an infinite set.With these considerations in mind it is natural to use the equivocationas a theoretical secrecy index. It may be noted that there are two significantequivocations, that of the key and that of the message.
These will be denotedby HE (K) and HE (M ) respectively. They are given by:HE (K) =XP (E, K) log PE (K)XP (E, M ) log PE (K)E,KHE (M ) =E,M685in which E, M and K are the cryptogram, message and key andP (E, K) is the probability of key K and cryptogram EPE (K) is the a posteriori probability of key K if cryptogram Eis intercepted.P (E, M ) and PE (M ) are the similar probabilities for message instead of key.The summation in HE (K) is over all possible cryptograms of a certain length(say N letters) and over all keys.
For HE (M ) the summation is over all messages and cryptograms of length N . Thus HE (K) and HE (M ) are both functions of N , the number of intercepted letters. This will sometimes be indicated explicitly by writing HE (K, N ) and HE (M, N ). Note that these are“total” equivocations; i.e., we do not divide by N to obtain the equivocationrate which was used in MTC.The same general arguments used to justify the equivocation as a measure of uncertainty in communication theory apply here as well. We note thatzero equivocation requires that one message (or key) have unit probability, allother zero, corresponding to complete knowledge.
Considered as a functionof N , the gradual decrease of equivocation corresponds to increasing knowledge of the original key or message. The two equivocation curves plotted asfunctions of N , will be called the equivocation characteristics of the secrecysystem in question.The values of HE (K, N ) and HE (M, N ) for the Caesar type cryptogramconsidered above have been calculated and are given in the last row of Table 1. HE (K, N ) and HE (M, N ) are equal in this case and are given in decimal digits (i.e., the logarithmic base 10 is used in the calculation). It shouldbe noted that the equivocation here is for a particular cryptogram, the summation being only over M (or K), not over E. In general the summationwould be over all possible intercepted cryptograms of length N and wouldgive the average uncertainty.