shannon1949 (776132), страница 8
Текст из файла (страница 8)
The computational difficulties are prohibitivefor this general calculation.12P ROPERTIES OF E QUIVOCATIONEquivocation may be shown to have a number of interesting properties, mostof which fit into our intuitive picture of how such a quantity should behave.We will first show that the equivocation of key or of a fixed part of a messagedecreases when more enciphered material is intercepted.Theorem 7.
The equivocation of key HE (K, N ) is a non-increasing functionof N . The equivocation of the first A letters of the message is a non-increasingfunction of the number N which have been intercepted. If N letters have beenintercepted, the equivocation of the first N letters of message is less than orequal to that of the key. These may be written:686HE (K, S) ≤ HE (K, N )HE (M, S) ≤ HE (M, N )HE (M, N ) ≤ HE (K, N )S≥NS≥N (H for first A letters of text)The qualification regarding A letters in the second result of the theoremis so that the equivocation will be calculated with respect to the amount ofmessage that has been intercepted.
If it is, the message equivocation may (andusually does) increase for a time, due merely to the fact that more letters standfor a larger possible range of messages. The results of the theorem are whatwe might hope from a good secrecy index, since we would hardly expect tobe worse off on the average after intercepting additional material than before.The fact that they can be proved gives further justification to our use of theequivocation measure.The results of this theorem are a consequence of certain properties of conditional entropy proved in MTC. Thus, to show the first or second statementsof Theorem 7, we have for any chance events A and BH(B)≥HA (B).If we identify B with the key (knowing the first S letters of cryptogram)and A with the remaining N − S letters we obtain the first result.
Similarlyidentifying B with the message gives the second result. The last result followsfromHE (M )≤HE (K, M ) = HE (K) + HE,K (M )and the fact that HE,K (M ) = 0 since K and E uniquely determine M .Since the message and key are chosen independently we have:H(M, K) = H(M ) + H(K).Furthermore,H(M, K) = H(E, K) = H(E) + HE (K),the first equality resulting from the fact that knowledge of M and K or ofE and K is equivalent to knowledge of all three. Combining these two weobtain a formula for the equivocation of key:HE (K) = H(M ) + H(K) − H(E).In particular, if H(M ) = H(E) then the equivocation of key, HE (K), isequal to the a priori uncertainty of key, H(K). This occurs in the perfectsystems described above.A formula for the equivocation of message can be found by similar means.We haveH(M, E) = H(E) + HE (M ) = H(M ) + HM (E)HE (M ) = H(M ) + HM (E) − H(E).687If we have a product system S = T R, it is to be expected that the secondenciphering process will be decrease the equivocation of message.
That thisis actually true can be shown as follows: Let M, E1 , E2 be the message andthe first and second encipherments, respectively. ThenPE1 E2 (M ) = PE1 (M ).ConsequentlyHE1 E2 (M ) = HE1 (M ).Since, for any chance variables, x, y, z, Hxy (z)≤Hy (z), we have the desiredresult, HE2 (M )≥HE1 (M ).Theorem 8. The equivocation in message of a product system S = T R isnot less than that when only R is used.Suppose now we have a system T which can be written as a weightedsum of several systems R, S, · · · , UT = p1 R + p 2 S + · · · + p m UXpi = 1and that systems R, S, · · · , U have equivocations H1 , H2 , H3 , · · · , Hm .Theorem 9.
The equivocation H of a weighted sum of systems is boundedby the inequalitiesXp i Hi ≤ H ≤Xp i Hi −Xpi log pi .These are best limits possible. The H’s may be equivocations either of key ormessage.The upper limit is achieved, for example, in strongly ideal systems (to bedescribed later) where the decomposition is into the simple transformationsof the system. The lower limit is achieved if all the systems R, S, · · · , U go tocompletely different cryptogram spaces. This theorem is also proved by thegeneral inequalities governing equivocation,HA (B) ≤ H(B) ≤ H(A) + HA (B).We identify A with the particular system being used and B with the key ormessage.There is a similar theorem for weighted sums of languages.
For this weidentify A with the particular language.Theorem 10. Suppose a system can be applied to languages L 1 , L2 , · · · , Lmand has equivocation characteristics H1 , H2 , · · · , Hm . When applied to thePweighted sum pi Li , the equivocation H is bounded byXp i Hi ≤ H ≤Xp i Hi −688Xpi log pi .These limits are the best possible and the equivocations in question can beeither for key or message.The total redundancy DN for N letters of message is defined byDN = log G − H(M )where G is the total number of messages of length N and H(M ) is the uncertainty in choosing one of these.
In a secrecy system where the total number ofpossible cryptograms is equal to the number of possible messages of lengthN, H(E)≤ log G. Consequently,HE (K) = H(K) + H(M ) − H(E)≥ H(K) − [ log G − H(M )].HenceH(K) − HE (K) ≤ DN .This shows that, in a closed system, for example, the decrease in equivocationof key after N letters have been intercepted is not greater than the redundancyof N letters of the language. In such systems, which comprise the majority ofciphers, it is only the existence of redundancy in the original messages thatmakes a solution possible.Now suppose we have a pure system. Let the different residue classes ofmessages be C1 , C2 , C3 , · · · , Cr , and the corresponding set of residue classesof cryptograms be C10 , C20 , C30 , · · · , Cr0 . The probability of each E in C10 is thesame:P (E) =P (Ci )ϕiE a member of Ciwhere ϕi is the number of different messages in Ci .
Thus we haveP (Ci )P (Ci )logϕiϕiiXP (Ci )P (Ci ) log=−ϕiiH(E) = −XϕiSubstituting in our equation for HE (K) we obtain:Theorem 11. For a pure cipherHE (K) = H(K) + H(M ) +XiP (Ci ) logP (Ci ).ϕiThis result can be used to compute HE (K) in certain cases of interest.68913E QUIVOCATION FOR S IMPLE S UBSTITUTION ON AT WO L ETTER L ANGUAGEWe will now calculate the equivocation in key or message when simple substitution is applied to a two letter language, with probabilities p and q for 0and 1, and successive letters chosen independently. We haveHE (M ) = HE (K) = −XP (E)PE (K) log PE (K)The probability that E contains exactly s 0’s in a particular permutation is:1 s N −s(p q+ q s pN −s )2Fig. 6.
Equivocation for simple substitution on two-letter languageand the a posteriori probabilities of the identity and inverting substitutions(the only two in the system) are respectively:PE (0) =There are NspN −s q sps q N −sP(1)=E(ps q N −s + q s pN −s )(ps q N −s + q s pN −s )terms for each s and henceHE (K, N ) = −Xs!N s N −sps q N −spqlog s N −s.s(p q+ q s pN −s )690For p = 31 , q = 23 , and for p = 18 , q = 78 , HE (K, N ) has been calculated andis shown in Fig. 6.14T HE E QUIVOCATION C HARACTERISTIC FOR A“R ANDOM ” C IPHERIn the preceding section we have calculated the equivocation characteristicfor a simple substitution applied to a two-letter language. This is about thesimplest type of cipher and the simplest language structure possible, yet already the formulas are so involved as to be nearly useless.
What are we todo with cases of practical interest, say the involved transformations of a fractional transposition system applied to English with its extremely complexstatistical structure? This complexity itself suggests a method of approach.Sufficiently complicated problems can frequently be solved statistically.
Tofacilitate this we define the notion of a “random” cipher.We make the following assumptions:1. The number of possible messages of length N is T = 2R0 N , thus R0 =log 2 G, where G is the number of letters in the alphabet. The number ofpossible cryptograms of length N is also assumed to be T .2. The possible messages of length N can be divided into two groups: onegroup of high and fairly uniform a priori probability, the second group ofnegligibly small total probability. The high probability group will contain), that is, R is the entropy of theS = 2RN messages, where R = H(MNmessage source per letter.3.
The deciphering operation can be thought of as a series of lines, as inFigs. 2 and 4, leading back from each E to various M ’s. We assume kdifferent equiprobable keys so there will be k lines leading back fromeach E. For the random cipher we suppose that the lines from each Ego back to a random selection of the possible messages.
Actually, then, arandom cipher is a whole ensemble of ciphers and the equivocation is theaverage equivocation for this ensemble.The equivocation of key is defined byHE (K) =XP (E)PE (K) log PE (K).The probability that exactly m lines go back from a particular E to the highprobability group of messages iskm!STm 1−STk−mIf a cryptogram with m such lines is intercepted the equivocation is log m.The probability of such a cryptogram is mT, since it can be produced by mSK691keys from high probability messages each with probabilityequivocation is:kT XkHE (K) =Sk m=1 m!STm S1−Tk−mT.SHence them log mWe wish to find a simple approximation to this when k is large.
If theexpected value of m, namely m = SK, i1, the variation of log m overTthe range where the binomial distribution assumes large values will be small,and we can replace log m by log m. This can now be factored out of thesummation, which then reduces to m. Hence, in this condition,Sk.= log S − log T + log kHE (K) = logT.HE (K) = H(K) − DN,where D is the redundancy per letter of the original language (D = DNN ).If m is small compared to the large k, the binomial distribution can beapproximated by a Poisson distribution:!k m k−m .