Aliens and xor

# Bitstream exercises in class

Background assumptions Our messages in bitstreams will be a succession of words. The ``words'' in this conversation will be one of three types (the given percentage frequencies are approximate):

• 50% of the words are 1010101010. This is 10 bits long, alternating 1's and 0's. Call it A .
• 25% of the words are 1111111. This is 7 bits long, all 1's. Call it B.
• 25% of the words are 000000. This is 6 bits long, all 0's. Call it C.

We'll call bitstreams composed of these words, ``sentences''. The sentence CBAC is:

```00000011111111010101010000000
```
Bitstreams will divided into groups of five bits each to make it easier to refer to parts of them. The sentence above becomes:
```00000 01111 11110 10101 01000 0000
```

First problem Write this sentence in terms of A , B, and C.

```10101 01010 11111 11101 01010 10000 00010 10101 01011 11111 10101 01010
```

Second problem Here is a sentence which has been xored with a pseudorandom bitstream with approximately 10% 1's. Write the original sentence in terms of A, B, and C.

```11011 11101 01010 10101 00010 10001 01010 00010 00010 10101 01011 11011
```

Third problem A pseudorandom bitstream with approximately 50% 1's and 50% 0's has been created. Two different sentences have been xored with it. Write the original sentences in terms of A, B, and C .

```11110 00111 00101 10010 10010 10011 11101 11110 00100 10111 01110 11011 100
```
```11110 00111 10000 10010 11000 00110 10111 10100 10101 11101 00100 01001 00
```

# Alien messages

Please hand in Problem 1 and either problem 2 or problem 3 on Friday, November 5, 1999.

Background assumptions We imagine an alien language which has only three words:

• 40% of the words are 1111111. Abbreviate this word with A. It is seven bits long.
• 30% of the words are 00000. Abbreviate this word with B. It is five bits long.
• 30% of the words are 10101010. Abbreviate this word with C. It is eight bits long.

The aliens fit their words together to make binary strings whose length is about 100 bits. These messages are disguised by xoring them with pseudorandom bit strings.

Problem 1

Decipher the following messages. That is, describe the sequence of alien words which produced them. For example, if you believe the message is 00000 1111111 10101010 10101010 you would report that the aliens were signalling BACC.

a) The aliens have xored a message with a pseudorandom bit string with about 5% 1's.

```10111 11111 11111 01010 10111 11110 01001 01010 10000 00101 01010 11111 11101 01110 00000 10111 11101 01010 11101 11000 00
```

b) The aliens have xored a message with a pseudorandom bit string with about 10% 1's.

```10101 00001 10011 01111 00000 11111 11000 10110 10111 01010 10111 11111 11011 10000 11000 11010 10100 00001 11111 11111 010
```

Problem 2

The aliens created a pseudorandom bitstream with 50% 1's and 50% 0's, but they made a major error. They used the same stream to conceal three different messages. The concealed messages are displayed below. Decipher as much as you can of all three messages.

First message

```00100 11011 10110 10110 10101 11110 11101 11111 11110 00000 10010 01110 10110 00011 01111 01011 00000 10110 11000 10010 00101 1
```

Second message

```00100 11001 00011 01001 11010 01110 11100 10101 01110 00011 01111 01110 11100 10110 00000 10100 11111 00110 11011 01111 01111 0
```

Third message

```01110 01110 11100 00110 11010 01110 00011 11111 11011 01010 00111 01110 00011 00110 00101 11110 01010 01100 01110 00101 110
```

Hint

Xor of first and second messages

```00000 00010 10101 11111 01111 10000 00001 01010 10000 00011 11101 00000 01010 10101 01111 11111 11111 10000 00011 11101 01010 1
```

Xor of first and third messages

```01010 10101 01010 10000 01111 10000 11110 00000 00101 01010 10101 00000 10101 00101 01010 10101 01010 11010 10110 10111 111
```

Xor of second and third messages

```01010 10111 11111 01111 00000 00000 11111 01010 10101 01001 01000 00000 11111 10000 00101 01010 10101 01010 10101 01010 101
```

Comment These xor bits record whether the pairs of messages agree or disagree.

Problem 3

The aliens now try to xor two messages together and send the following bitstream:

```11111 11111 00000 10111 11101 01011 01001 01000 00101 00101 00000 01010 00010 10101 11100 11111 11110 10101 00000 00111 11
```

Reconstruct both messages from this information.

Of course, this illustrates what might happen if one used a book to create pseudorandom strings. All known languages have many statistical irregularities. These irregularities usually seem to diminish the security of encryption schemes which rely on the possible ``random'' nature of text in that language.

# Answers to the bitstream exercises in class

This is direct translation and the answer is ABACABA.

This is the pseudorandom bitstream used in the problem:

```00100 00000 00000 00000 01000 00100 00000 10010 00000 00000 00000 00100
```
and here's the original bitstream:
```11111 11101 01010 10101 01010 10101 01010 10000 00010 10101 01011 11111
```
so that the bitstream is BAAACAB. One can guess this (and, really, only guess it!) by looking at the patterns. The first 7 bits in the bitstream in the problem statement are 11011 11 and since every bit there has a 90% chance of being correct, it is rather unlikely that this is a result of xoring with either A or C. For example, if we had started with C, we'd need to change from 00000 0 to 11011 1 and that would mean a total of 5 out of 6 ``bitflips'' -- there's only one chance out of 100,000 (that's 105) of that happening. If we had started with A, we'd need to change from 10101 01 to 11011 11. The number of bitflips here would be 4. The chance of that occurring is one chance out of 10,000. Compare that with starting with B, where only one bitflip (one chance of 10) is necessary.

This is the pseudorandom bitstream used in the problem:

```01011 01101 10000 11000 01101 01100 00010 10100 10000 10111 10001 00011 100
```
and here are the two original sentences (in order):
```10101 01010 10101 01010 11111 11111 11111 01010 10100 00000 11111 11000 000
```
This sentence is AABBACBC.
```10101 01010 00000 01010 10101 01010 10101 00000 00101 01010 10101 01010 10
```
This sentence is ACAACAA.

The xor of the two sentences is

```00000 00000 10101 00000 01010 10101 01010 01010 10001 01010 01010 10010 10
```
which must be the same as the xor of the two encrypted sentences. Why?

Digression on xor

xor is addition mod 2. So the entire definition of xor is given by these equations:

 0 xor 0 = 0 1 xor 0 = 1 0 xor 1 = 1 1 xor 1 = 0

Briefly, agreement of bits is signalled by 0 and disagreement of bits is signalled by 1. xor is used extensively in cryptography because a xor b xor b = a for any a's and b's, so we can do the following:

 One original message bit Encryption using b Message transmission Decryption using b Original message bit received a --> a xor b = m m -----> m m xor b = a xor b xor b = a --> a

End of digression on xor

If a is a bit from the first sentence and A is a corresponding bit (in the same position) from the second sentence, and b is the corresponding bit from the bitstream which encrypts them both, then a xor b and A xor b are the bits of the encrypted stream. If we xor these bits we get:

```(a xor b) xor (A xor b) = a xor A xor b xor b = a xor A
```
so b has no influence on what's left. We can just use ideas about a and A to try to guess about them. In the case of the messages given, we are lucky (not very, given the statistics of this language!) that both of the messages begin with the word A = 1010101010. The next pattern in the xor'd bitstring is 10101 which suggests alternating agreement and disagreement among the bitstreams. Since 1 means disagreement of bitstreams, we examine our dictionary:

 A = 1010101010; B = 1111111; C = 000000;

and see that one of the sentences has the word A and one must have C, since they disagree on the first bit of the third group. Now we continue, really guessing which of the sentences has A and which has C and following the consequences. Sometimes the wrong guess will be made, and ``backtracking'' will need to be done: that is, consideration of alternative possibilities.

# Other aliens?

Please hand a solution to this problem on Friday, November 12, 1999.

Assumptions here

• 40% of the words are 111111. Abbreviate this word with Q. It is six bits long.
• 30% of the words are 11001100. Abbreviate this word with S. It is eight bits long.
• 20% of the words are 00000. Abbreviate this word with R. It is five bits long.
• 10% of the words are 0101010. Abbreviate this word with T. It is seven bits long.

Each message will be about 100 bits long. Below is the xor of two messages. Reconstruct both messages from this information.

The xor of two bits: 0 indicates agreement; 1 indicates disagreement.

Ciphertext

`00000 01111 10001 10011 00110 10101 10000 01001 10010 01100 01010 00010 10011 00111 11110 01100 10110 01001 01011 11001 1`