Diary for Newer Math, fall 2003

Date What happened (outline)

Thursday,
December 11 Mr. Burton discussed Euclid's algorithm and the binary algorithm for the GCD. Mr. Wilson added further information, and related the Fibonacci numbers to the GCD computation. Mr. Jaslar discussed the solution of the VjN problem and related matters.

Monday,
December 8 Mr. Tully discussed randomness, giving some of the history and comparing various "definitions" of random.
Mr. Siegal discussed various aspects of coding theory, and computed certain numbers expressing the efficiency of a code for repetition codes, parity codes, two Hamming binary codes (the (7,4) and (8,4) codes), and a Hamming (4,2) ternary code.

Thursday,
December 4 Ms. Jesiolowski discussed the history and some of the mathematics of the Four Color Theorem. She included some information on the controversy about computer-aided proofs. An invited guess, Professor Zeilberger, who has strong views about these matters, generously joined us for half the period.

Monday,
December 1 Mr. Palumbo discussed P vs. NP, NP-completeness, satisfiability of Boolean cirucits as an NP-complete problem, and minesweeper (!) as another example of an NP-complete complete.

Tuesday=Thursday,
November 25 The lecturer's concluding remarks about the course.

Monday,
November 24 Mr. Ryslik discussed the knapsack encryption system.

Monday,
November 17 I urged students to work on their presentations and papers.
I briefly discussed Bell Labs, where much of what I will speak about in the next few meetings will be discussed. I talked a bit about Shannon whose two papers published in the Bell System Technical Journal, entitled A Mathematical Theory of Communication founded the fields which are today called information theory and coding theory. In the paper is a citation of a coding method of Richard Hamming, who was concerned about error-correction in the early years of digital computers. I hope to discuss that specific method on Thursday. By the way, here is a transcript of a wonderful talk Hamming gave on "You and Your Research". He was quite outspoken. And here is Hamming's original article about error detecting and error correcting codes.
Coding theory deals with the reliable, economical storage and transmission of information. Under "storage" are such topics as compression of files. Problems concerned with "transmission" include how to insure that the receiver gets the message the sender transmits down a noisy channel: a channel where the bits may flip. I mentioned ISBN numbers (on each book) as an example of an error-detection scheme.
We discussed repeating bits as a way of insuring the message. We got started on inequalities which must be satisfied for accurate transmission of messages with error-correction built in. I'll do more Thursday.
There are many links to error correcting codes on the web. Here's a short one and here's a whole book!>

Thursday,
November 13 Professor Kahn spoke about correlation inequalities.

Monday,
November 10 I announced that another faculty member from the Mathematics Department would come and speak. He is Professor Jeff Kahn, and he will be here at the next class.
Perhaps Professor Kahn's best known work on on the Borsuk Conjecture, which states that every subset of R^d (d-dimensional space) which has finite diameter can always be partitioned into (d+1) sets with smaller diameter. The conjecture was made in 1933, and sixty years later Professor Kahn helped to show that it was false in high dimensions, in fact, spectacularly false: when d is large, one may need at least (1.1)^sqrt(d) pieces. Specific examples were provided for d=9162, showing that we don't know much about high dimensions. The lowest dimension currently known for a counterexample is 321, work which was done in 2002. Borsuk's Conjecture and related issues turn out to be rather important in many applications. Professor Kahn wants to discuss some more current work, and asked me to present several topics to you.
Bipartite graphs
These are graphs whose vertex set can be split into two non-empty and non-overlapping parts, say V₁ and V₂, so that edges go only between vertices in the two groups, and not among them.
One nonexample is K₃ which is not bipartite, because if we divide the 3 vertices into two groups, one will have two vertices and there will be an edge between them. An example of a bipartite graph is K_3,3 which consists of 3 homes (Fred, Jane, Alice) and 3 utility states (water, gas, electric) and the edges connect the utility stations and each of the homes.
Mr. Burton successfully identified a way to recognize bipartite graphs. Consider a cycle: that is, a path in a graph which returns to itself. A graph will be bipartite if it has no cycle of odd length. In fact, this allows us to "construct" the bipartition: pick one node. Make V₁ the collection of nodes which can be connected to the initially picked node by an path with an even number of edges, and V₂ those which can be connected with an odd number of edges (for simplicity, we are assuming that all nodes can be connected to each other, or else we apply this to the separate pieces of the graph). Then the absence of odd cycles leads to recognizing that this makes the graph bipartite.
I also took the change to quote a theorem of Kuratowski from the 1930's, that a graph can be drawn in the plane with no edges crossing ("a planar graph") exactly when neither K₅ nor K_3,3 can be realized as a subgraph. (There are some technicalities I am omitting in this, but it is essentially correct.)
Conditional Probability
If A and B are events, then the conditional probability of A given B, denoted p(A|B), is defined to be p(A and B happen)/p(B). So we change the "universe" of outcomes by restricting only to outcomes which lie in B. Of course, this number can be rather different from p(A). I discussed this with some examples.
The most interesting simple result about conditional probability is called Bayes' Theorem (dating back to 1753, apparently!). One version of Bayes' Theorem is the following: p(A|B)=[p(B|A)p(A)]/[p(B|A)p(A)+p(B|A^c)p(A^c)]. Here A^c is the complementary event (every outside of A). We analyze the right-hand side of B' T first. The bottom, p(B|A)p(A)+p(B|A^c)p(A^c), is, by definition of conditional probability, p(B and A)+p(B and A^c), and since A and A^c are all outcomes, this is p(B). The top, p(B|A)p(A), is [p(B and A)/p(A)]·p(A) which is p(B and A). So the resulting quotient is p(B and A)/p(B) which is what the left-hand side of Bayes' Theorem states.
I gave some silly application of Bayes' Theorem (male/female and smoker/non-smoker facts). Bayes' Theorem and conditional probability are basic to much of modern statistics. There are many references on the web to this material.

Thursday,
November 6 Professor Jozsef Beck spoke about "games" with complete information, with many games having Ramsey theory as a framework. He spoke about avoidance games and achievement games, and maker/breaker games. He stressed the idea that there could exist winning strategies without any description of them!

Monday,
November 3 I discussed the probabilistic method which allows certain underestimates of Ramsey numbers.

Thursday,
October 30
We got an overestimate for Ramsey numbers, basically as in the notes, with a greedy approach. I discussed some of the known values of Ramsey numbers, and told the story of Erdös, the aliens, and Ramsey numbers. I will try to demonstrate an underestimate on Monday, using magic.

Monday,
October 27 We discussed graphs and found some real-world graphs. We discussed coloring graphs, and proved some facts about coloring complete graphs. Much of this is in the Governor's School notes, but not all of it (come to class!). We will continue the discussion on Thursday.

Thursday,
October 23 I began consideration of the idea that large enough samples will always have some small-scale order. This is done in the Governor's School notes on the pages on and after page 57. In particular, I discussed the pigeonhole principle. I remarked that its results can sometimes be difficult to accept: they can be non-constructive, so no solution is exhibited. One just gets knowledge that a solution exists.
After what looked like merely amusement (see the notes referenced above) I proved a theorem (although as usual I fouled it up a bit and Mr. Jaslar helped me). The illustrated t-shirt is from the Australian Mathematics Trust, and it shows pigeons in pigeonholes, and the lettering (not too readable here) names Dirichlet, who first really identified and used the pigeonhole principle.
The same source states:
It was Dirichlet who formulated the Pigeonhole Principle, often known as Dirichlet's Principle, which states

If there are p pigeons placed in h holes and p>h then there must be at least one pigeonholecontaining at least 2 pigeons.

Here is a further discussion of problems relating to the pigeonhole principle, including a clear proof of the "approximation to irrationals" statement which I tried to state and verify. Please look at this page.

Monday,
October 20 We discussed the topics. I then discussed birthdays and tried to contrast probabilistic reasoning with the certainly available in large enough samples.
I then began discussing parties, and Mr. Burton tried to 2-color a graph with 5 edges so that there was no monochromatic triangle. Ms. Jesiolowski found a nice solution.
We will return to these problems on Monday.

Thursday,
October 16 I finished the discussion of the Broadcasting Model, including one proof in detail using mathematical induction.
Professor Paul Leath of the Physics Department discussed percolation and the ingenious experiment documented in his paper, Conductivity in the two-dimensional-site percolation problem, in Phys. Rev. B 9, 4893-4896 (1974). (The link works on browsers from Rutgers-sponsored computers.)
Monday we will discuss the topics for papers.

Monday,
October 13 I began the discussion of the Broadcasting Model: pp.48-50 of the notes. This including simulation of the binary tree network, and analysis of what the limiting probabilities could be, if they were shown to exist.

Thursday,
October 9 Discussion of the transmission model (see pages 46 and 47 of the notes).
Discussion of the binomial theorem. "Proof" of the fundamental relationship among binomial coeffiencients (Pascal's Triangle) using the idea that of looking at n+1 elements following a suggestion of Mr. Jaslar, distinguishing a special element, and then looking for k element subsets which do and do not have the special element (so the n+1,k binomial coefficient will be the sum of the n k and n k+1 binomial coefficient).
More than you want to know about binomial coefficients
And a few things about Fibonacci numbers, too

Monday,
October 6 I discussed expectation. I essentially discussed pages 41-45 of these notes.
We discussed (as in the reference above) the coin flipping protocol apparently due to John von Neumann. I requested comment on a converse problem, which I named NvJ as a joke.
In the NvJ problem it seems easy to get a protocol which works if the probability p is rational, a quotient of integers. So it seemed appropriate to give Euclid's other well-known proof, that sqrt(2) is irrational. I then remarked that a similar proof would show that sqrt(3) is irrational, and asked what would go wrong if we wanted to prove that sqrt(4) was irrational using the same technique!

Thursday,
October 2 I explained that the list of topics would be steadily updated and augmented.

It's not a bug, it's a feature!
I tried to discuss problems with the mathematical model called probability. I reviewed what we did Monday, and then discussed three examples in detail, also along the way "reviewing" certain mathematical results. One further definition:
Two events E and F are independent if p(an outcome in both E and F occurs)=p(E) p(F): the probabilities of the intersection multiply. Of course, one can't argue with definitions, but in this case we are supposed to see that somehow two separate "games" are being run, and the outcome in one doesn't influence the outcome in the other. For example, pick a card from a standard deck. The events E={the card is an ace} and F={the card is a spade} are independent. If G={the card is the club ace} then neither E and G nor F and G are independent.
Example 1
The fair coin, one flip. Here the whole sample space is {H,T}, and there are a total of 4 events. If a sample space has n elements, then the number of subsets will be 2². In this case, n=2. The probabilities assigned to the events are these:
p(Empty Set)=0; p({H})=1/2; p({T})=1/2;p({H,T})=1.
It isn't too hard to see that this assignment of probabilities satisfies all the requirements of the rules. In fact, we really won't run into conceptual difficulties with any finite sample space, S. I do admit that one can have computational difficulties (for example, offhand I don't know the probability that a 5-card poker hand is a straight!). But no "philosophical" problems seem to occur.
Example 2
Flip a fair coin until a tail occurs. Assume that the flips are independent. Then the elements of the sample space can be listed:
{T,HT,HHT,HHHT,...,HⁿT,...,H^infinity}
Here we need to do some explaining. Each element of the sample space represents a sequence of tosses, read from left to right. For example, HHHT represents the outcome of three heads followed by a concluding tail. The notation HⁿT, with n here a non-negative integer (so n is 0 or 1 or 2 or ...), I'll use H^infinity to represent the outcome of a sequence of H's, tossed endlessly. (Please see the first scene in Rosencrantz and Guildenstern are dead by Tom Stoppard to put this game in a proper dramatic setting.) What should the probabilities for each outcome be?
If the coin is fair, then p({T})=1/2. If the tosses are independent, then this suggests that p({HT})=1/4, p({HHT})=1/8, etc., and p({HⁿT})=1/2ⁿ⁺¹ since we are in effect prescribing n+1 independent fair flips. Now we need a fact about geometric series. Recall, please:
Geometric series

If |r|<1, then a/(1-r)=a+ar+ar²+ar³+...
There are many good references for this fact.
Here is one of them.

Now back to the principal discussion. Consider the event which is all of the HⁿT tosses. What should its probability be? It should be the sum of the geometric series with a=1/2 and r=1/2, and this is 1/2/(1-1/2)=1. But then since the probability of the whole sample space must be 1, the probability of H^infinity has got to be 0. Or, another way, for those who know and like limits, this is a prescribed infinite sequence of independent fair flips, so its probability is lim_n-->1/2ⁿ=0.
In any case we seem to have a possible or a conceivable event (namely, { H^infinity}) which has probability 0. What does this mean? I guess it means it almost never happens, or that the other outcomes almost always happen? The phrase "almost always" is used in probability theory for an event which has probability 1, and this is an example of such an event. This is part of the phenomenon of infinity, and things can get even worse.
Example 3
Here the "game" will consist of picking a number uniformly at random (??!!) from the unit interval. So the outcomes will all consist of the numbers in [0,1]. Let's think of some events. Well, one event could be that the number picked is in the left half of the unit interval. What should the probability of that be? (I did not allow discussion about whether the outcome 1/2 was in or out of the event [very arbitrary], and from later perspective you will see this doesn't matter at all!) the probability of this event is 1/2. Such considerations lead to the following: if an event is, say, the chance that the number picked lies in an interval of [0,1], then p(the number picked is in the interval)=the length of the interval. Indeed, this is what people mean when they say "uniformly at random". Now there are some concepts which I find difficult to understand. I asked students to pick numbers in [0,1] uniformly at random. I analyzed the outcomes.
The first "random" number was something like 1/7. Well, golly: the event of picking 1/17 (just call it {1/17}) is certainly inside the event of picking a number between 1/17-1/100 and 1/17+1/100. But the latter event is an interval with length 2/100=1/50. So p({1/17})<=1/50. In fact, we can show that p({1/17}) is <= any positive number. The only value we can assign is that p({1/17})=0. So here is another event of probability 0 which can actually happen (!?): or can it? In fact, the probability of picking any one single number uniformly at random from [0,1] is 0.
I then examined the next three or four "random" numbers contributed by students. It seemed that the numbers picked were all rational: quotients of integers. This exhibited another phenomenon, a phenomenal result which is attributed to Cantor in 1873.
Sizes of some sets

The positive integers, N, and the rationals, Q, have the same number of elements.
These two sets can be placed into a one-to-one correspondance with each other, so all elements of both are "used up" in the correspondance.
I gave a discussion of this and other "basic" facts about sizes of sets last year in Math 311.

The equal "size" or cardinality (to give it the technical name) of these sets is not at all supposed to be obvious, and in fact Cantor's demonstration of this fact and other analogous results were very startling to the mathematical community of his day.
So now we know that there is some kind of correspondance: if n is a positive integer, then there can be a rational q_n corresponding to n. The correspondance can be described in such a way that the collection of all of the q_n's is all of the rationals in [0,1]. I will now try to convince you that the event W which is described by "picking a rational number uniformly at random from [0,1]" has small probability. Look at q_n. Imagine this is inside the interval with endpoints q_n-1/100ⁿ and q_n+1/100ⁿ. Now the length of this interval is 2/100ⁿ. The event W is containing in the union of these intervals, so that the probability is at most the sum of the probabilities of the outcome being in each interval. But that's the sum of 2/100ⁿ with n=1, 2, 3, ... this is again a geometric series with a=2/100 and r=1/100. Then the sum is 1/(1-r) which is 2/99, fairly small. Therefore p(W)<=2/99. You can see that by adjusting the length of the including intervals to be even smaller, you can make the probability of W as small as you want. The only value p(W) can have is 0. Wow.
Please note that I applied this logic to W just to show you that the probability of picking a rational number uniformly at random from the unit interval is 0. In fact, much more broadly I can assert (with the same supporting logic) the probability of picking one element of any specific sequence of numbers from [0,1] is 0. This is wonderful and strange.
Please be on your guard and try to understand what's happening when we apply probability theory, a mathematical model, to situations which have infinitely many alternatives.
Notes
Here are some links about two topics discussed during the class.
The infinity symbol
Near the bottom of this page is a discussion about the history of the infinity symbol. I quote from this page: "The infinity symbol was introduced by John Wallis (1616-1703) in 1655 ..."
A BIG hotel
Here is a discussion of the hotel which was mentioned in class. The hotel is sometimes called Hilbert's hotel. And here is a cartoon about the hotel.

Monday,
September 29 I gave out a list of topics students could do presentations/papers about.
I completed an analysis of the Russian peasant/repeated squaring method of computing A^B mod C. Write B in binary form. The 0 and 1 numerals will take up #₂(B) digits. Now repeatedly square B mod C #₂(B) times. And "assemble" A^B mod C by multiplying together each of the squares -- this will need also at most #₂(B) multiplications. Since the largest any of these numbers can be is C, the work for each multiplication will be at most 6(#₁₀(C))². We also know that #₂(B)<=4#₁₀(B). Therefore we need at most 8#₁₀(B) multiplications and each of those needs 6(#₁₀(C))² operations. So this is a polynomial algorithm (degree 3). There are ways to speed up multiplication, and there are also even better ways under some circumstances than repeated squaring to do exponentiation. This is interesting in the "real world" because many crypto protocols use exponentiation of big numbers with big expnents.
I discussed the history of probability beginning with some gambling games about 300 years ago. The current codification of probability "axioms" was done by Kolmogorov in the 1920's, so probability as an abstract mathematical field is relatively recent. Today was primarily devoted to explaining vocabulary words, with some useful simple examples.

outcome

sample space

event

relative frequency

probability

expectation

independence

The sample space consists of all possible outcomes of the "game", which we visualize as being played repeatedly. A collection of outcomes is called an event. As the game is played, the outcomes are counted and the number of outcomes in a specific event divided by the total number of times the game is played is called the relative frequency of the event. It is a "fraction", a number in [0,1] (the real numbers which are both greater than or equal to 0 and less than or equal to 1). It is our hope and our intuition (!) that as the number of plays of the game increases, this relative frequency will somehow stabilize, will somehow "approach" a limiting number, which will be called the probability of the event. With all this as background, we can now discuss the abstract mathematical model of the situation.
We start with a sample space of outcomes, and assign a number in [0,1] to events, collections of outcomes. If E is an event, this number is called the probability, p(E), of the event. Then the empty event with no outcomes, has probability 0. The event of all outcomes, the whole sample space, has probability 1. Also, if A and B are disjoint events (sometimes called exclusive), with no outcomes in common, we require that the probability of an outcome being either in A or in B is the sum of the probability of A and the probability of B. In math language, if A intersect B = the empty set, then p(A union B)=p(A)+p(B). I drew some pictures (close to Venn diagrams, I guess). I also deduced that if A is a subset of B (so every outcome in A occurs in B) then the probability of A is less than or equal to the probability of B. I remarked that when the events A and B are not disjoint, then p(A union B) is less than or equal to p(A)+p(B). The "less than" part occurs because we have no information about p(A intersect B). More accurate equations can be written using what is called the "inclusion-exclusion principle" but we likely will not need them.
We also analyzed John von Neumann's problem: given an unfair coin, simulate a fair one. That is, you are given a coin whose probability of heads is p where 0<p<1 and tails is q=1-p. By tossing this coin can you call out H and T in a fair way? We discussed the merits and possible defects of von Neumann's solution (there are other solutions). I did not make a formal definition of independence which is needed here. I will return to JvN.
Here is a whole book (over 500 pages long!), quite good, on probability, available for free on the web. We won't need much from this book, though. Here is a brief more informal introduction to probability, more in the spirit of the lectures.

Thursday,
September 25

How many molecules of H₂O are there in all the oceans of the world?
I used reasonable estimates of the size of the Earth and the number of molecules of water (I was advised about moles and grams and such). We got some apparently fantastically large number using a back-of-the-envelope calculation.
What I did was a modern version of Archimedes' reputed counting of the number of grains of sand which would be needed to fill the universe. An informal translation of this work, called The Sand Reckoner, is here, while another translation, possibly more accurate and with much more scholarly background, is here.
My purpose in all this was somewhat allied with that of Archimedes: I wanted to acquaint people with large numbers. It also was related to the concerns of the course. I tried to reanalyze Greenfield's fantastic factoring algorithm (GFFA). I assumed that the time of factoring using this method was proportional to N²6(#(N))². I also assumed that the time required to "treat" a 100 digit number was 1 second. Then we tried to estimate the number of seconds (and then number of years!) required to factor a 200 digit number. And if every molecule of water were magically turned into a computer and then aimed at our factoring problem, how many years would GFFA take? I wanted to illustrate the fact that exponential growth is huge, and to distinguish between exponential and polynomial growth.
P vs. NP
I revisited this because I again wanted to explore my "feelings" about whether the conjecture was true. I discussed how a video game decision procedure seemed relevant and mentioned Minesweeper (an analysis of Minesweeper is linked to the Clay Math Institute web pages on this problem). Also here are the results of a recent (2002) poll of professionals: their opinions of P/NP and when the problem might be solved (and how!).}
Diffie-Hellman protocol
I remarked that the difficulty of RSA seemed linked to the difficulty of factoring. Many attempts have been made to link other difficult math problems to protocols for secure communication. One method which is widely used is the Diffie-Hellman key exchange system. It relies on the (supposed!) difficulty of the discrete logarithm problem.
I described the setup, with every conversation of Alice and Bob being overheard by Eve, who has much greater computational capacity. I first tried to describe a stupid analogue of Diffie-Hellman, then a slightly more complicated one, and finally, the real version. I also hinted that in the "real world" things aren't quite that easy, and instead of arithmetic mod P, a sort of arithmetic with elliptic curves is used.
I just got a copy of the original paper by Diffie and Hellman. I think you can get this also if you log in through a Rutgers computer system.
Here is the original paper about RSA, by the way.
Efficient exponentiating
How should one compute A^B mod C? The most straightforward implementation seems to take exponentially many steps (not unlike GFFA!). I hinted at a more clever way to do exponentiation, which I will explain in full next time. A historical note
Here is an outline of the "secret" history of RSA and Diffie-Hellman inside the British analogue of the National Security Agency.

Monday,
September 22 I decided to try to analyze what easy and hard as these words could be applied to computational tasks. I wanted my definition to be valid no matter how numbers were represented. So we discussed representing numbers in the binary system, instead of the decimal system. (I wanted to go for Roman numerals, but people told me a base system was better.) I counted as 5 volunteers performed the binary dance (they stood up and sat down corresponding to the binary values of the counts).
Then I asked if a number in standard decimal form when written is 5 feet long, estimate how long the base 2 (binary) representation will be. This was more difficult. We estimated that a decimal integer with a 5 foot long representation would have 600 digits, so it would be about 10⁶⁰⁰. About how many binary digits (bits) would such a number need? We discussed this. Since 2³<10<2⁴, by taking 600^th powers, we know that the number will be between 2¹⁸⁰⁰ and 2²⁴⁰⁰: if we assume the digits are similarly spaced, that means the binary representation will take between 15 and 20 feet. Thus big numbers in one base will also be big numbers in any other base.
We then analyzed the complexity of some elementary operations of arithmetic: addition and multiplication. Please see pages 20 and 21 of the Gov's School notes. These operations depend on the length of the inputs. So I defined #(n) to be the number of decimal digits of n. Thus #(3076) is 4. If both A and B have #(A)=#(B)=k, then A+B can be computed in 2k unit operations (1 digit addition) and A·B can be computed in 6k^k operations. These are both polynomial time algorithms. An algorithm (defined in the notes) is polynomial time if it takes at most a polynomial number of "operations" in terms of the size of the input to compute its answer. Addition and multiplication are examples. Multiplication seems to be harder because the standard algorithm is quadratic in terms of the input length, while addition is first degree or linear.
A computational problem is easy if it has a polynomial time algorithm. A computation problem is hard if it does not have a polynomial time algorithm. We decided that if an arithmetic algorithm was polynomial time in one base, then it had to be polynomial time in any base since 3#₂(n)<#₁₀(n)<4#₂(n) using the logic above, so polynomial time in #₁₀(n) leads to polynomial estimates using #₂(n).
We analyzed Greenfield's factoring algorithm. Here the input is a positive integer n with size #(n) (back to base 10) and the output is either "n is prime" or "n=i·j" where i and j are integers between 1 and n.
Here is my description of my algorithm. For each i and j between 1 and n, compute i·j and compare this result with n. If equal, output "n=i·j". If none are equal, output "n is prime".
The "running time" for this is n²6#(n)². Note that this is not a polynomial in #(n). Indeed, the running time in terms of #(n) is 10^2#(n)6#(n)². This algorithm is exponential in terms of running time. There is no known algorthm for factoring which is not exponential.
However, if a suggested factorization ("n=i·j") is given, then we can check this in polynomial time (in fact, just 6#(n)² steps.) You can win a million dollars by finding out whether every problem whose solutions can be check in polynomial time actually has a polynomial time algorithm for solving it. This is the P vs. NP problem.

Thursday,
September 18 I stated Euler's improvement (where the modulus is a product of two distinct primes) to Fermat's Little Theorem. I tried to be more careful with the hypotheses. In particular, I needed to define the phrase relatively prime. For this I first defined the Greatest Common Divisor (GCD) of a pair of positive integers, a and b. So a positive integer c is the GCD of a and b (written c=GCD(a,b)) if c divides both a and b and c is the largest integer which divides both of them. Please notice that 1 divides both a and b, so the collection of common divisors always has at least one element! We computed examples, such as 1=GCD(5,7) and 4=GCD(4,8) and 1=GCD(8,27). Now a and b are relatively prime if 1=GCD(a,b). So a and b do not share any common factors except 1. Euler asserted that if a is relatively prime to the distinct prime numbers p and q, then a^(p-1)(q-1)=1 mod pq.
We tested Euler's theorem for 6=2·3. Here p=2 and q=3 so (p-1)·(q-1) is just 2. The Maple instruction seq(j^2 mod 6,j=1..5); asks the a sequence of values be computed as j runs from 1 to 5. The result is
1, 4, 3, 4, 1
As I remarked in class, this is somewhat deceptive, since there are only two numbers in the range from 1 to 5 which are relatively prime to 6. But if we try seq(j^24 mod 35,j=1..34); where p=5 and q=7 we get
1, 1, 1, 1, 15, 1, 21, 1, 1, 15, 1, 1, 1, 21, 15, 1, 1, 1, 1, 15, 21, 1, 1, 1, 15, 1, 1, 21, 1, 15, 1, 1, 1, 1
with lots of 1's!
I asked if p and q are both primes which have about 100 decimal digits, what are the chances that an integer between 1 and pq is relatively prime to both p and q? The multiples of p and the multiples of q are the integers not relatively prime to p or q. So the list of "good" integers is the list of all integers from 1 to pq with the bad ones thrown away: 10²⁰⁰(that's about pq)-2·10¹⁰⁰. So overwhelmingly most integers will satisfy the hypotheses of Euler's result.
I then covered an introduction to RSA. Most of the material is on pages 16--18 of my Gov's School Notes. Please look there.
I discussed various attacks on RSA. The best known is factoring. If one is given N=pq, a product of two primes, and if N can be factored with p and q then known, it turns out that RSA can be broken easily. There are other attacks, and the experience of having used the system for several decades. One reference is Boneh's paper (1999) surveying such attacks. Susan Landau has an article on the data encryption standard (DES) (2000) which may also be interesting.
I will try to write some descriptions of projects this weekend which people can look at. I also will try to write an RSA homework assignment. And catch up on my sleep.

Monday, September 15
I briefly discussed how secure communication (diplomatic, military, business), usually mediated by computer, is accomplished "today".
The messages to be passed between Alice and Bob are usually converted in some well-known way to a string of bits, 0's or 1's. These messages may represent text or music/sound or images or ... and typically they tend to be big chunks of bits. Alice and Bob want to exchange lots of information securely.
There is the exchange of keys between Alice and Bob. These keys may be three or four hundred bits long. I will describe the ideas behind the most used methods (RSA and Diffie-Hellman) soon.
The keys are used to create pseudorandom bitstreams by some relatively fast program (say, RC4 or whatever). Then Alice adds the bitstream, bitwise (mod 2), to her message bitstream.
Alice transmits the sum of the two bitstreams to Bob.
Bob uses the fact that (a+b)+b=a+(b+b)=a+0=a by adding his own copy of the pseudorandom bitstream to what he receives from Alice to recover the message.
If the pseudorandom bitstream has good statistical properties (close to random) then it is very unlikely that Eve can recover the message. To do this she would in essence have to predict the pseudorandom bitstream, and by definition, "random" streams can't be predicted, and our pseudorandom stream is close to random (of course, there are several words here, such as "random" and "close" which must be further specified).
We then went through some bitstream excercises in class. The handout also had some information about the statistics of English. I told students a bit about the Venona case<./a>, a post World War II spy case where the use of depths was quite important. I gave out a homework assignment about "alien" communication.

Thursday,
September 11 I gave out some secret sharing homework.
Topic 1 300 years ago ...
We derived Fermat's Little Theorem as on pages 11--12 on the Gov's School Notes.
Topic 2 Classical cryptography
We discussed classical cryptography, giving as an example a "book code" where Alice and Bob both possess copies of a book and the code consists of "adding" mod 26 the letters of a message to the text of a specific page of the book. I introduced these vocabulary words: cryptosystem (the whole scheme of encryption and decryption); key (in this case the page of the book that Alice and Bob would use); plaintext (the original message) and ciphertext (the encrypted message). The "actors" in this drama were Alice and Bob, and Eve, the eavesdropper would overhear their communications. Depending on the scenario, Eve could even know details of the cryptosystem, and could even be presumed to have vastly superior computational capabilities than Alice and Bob. I also mentioned that such book codes turn out in real life to be very easy to decrypt due to the biases of language (lots of e's in English, for example, and a good many th's) and to the redundancies of language (designed so that "noise" in the communication channel won't inhibit the message too much -- you can still understand a 'phone call even if there is a good amount of interference). So "book codes" aren't good, but maybe more intricate devices are (e.g., the now standard example is the Enigma device of World War II, an electromechanical cryptosystem. A practical difficulty turns out to be distribution of the "keys", the initial settings, for these machines, analogous to Alice and Bob knowing the page on which to begin their encrypted communication.
Topic 3 Key distribution
How can one distribute keys, initial settings, easily, economically, securely? This has been a problem for a long time, and was handled by couriers delivering first books, then tapes, then CD's, etc. Secure key distribution has allowed internet commerce, because many many participants would find it difficult to communication without it.
Topic 4 Crypto dreams ...
To somehow tie secure key distribution to a mathematical problem, so Alice and Bob could "solve" the problem easily but Eve could not. I tried to imagine a way to use Fermat's Little Theorem, but became frustrated. We will need a slightly stronger result, an extension of the theorem (done by Euler) for a product of two distinct primes.

Monday,
September 8 I basically discussed what was on pages 7-10 of the Gov's School notes.
Here is the Maple information. I gave out shares in a secret mod P.
Sharing secrets, mod P
I have chosen a large prime number: P=1005005005001. All arithmetic in this problem will be done mod P.
I also carefully created a polynomial, G(x), of degree 3. So G(x) "looks like" Ax³+Bx²+Cx+D where the constant term D is THE SECRET. I will give every student a "personal" and distinct share of the secret. Since G(x) has degree 3, at least four students will need to combine the information they have to find THE SECRET.
The secret, although a number, also represents an English word. I have made a word using a simple process which the following table may help explain:

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26

The English word SMILE becomes 19 13 09 12 05 which is more usually written 1913091205. This isn't a very convenient way of writing words, but it will be o.k. for what we do. Notice that you should always pair the digits from right to left, so that the number 116161205 will become 01 16 16 12 05 which is APPLE. The "interior" 0's will be clear but we need to look for a possible initial 0.
How to win this contest and get a valuable prize
Tell me the secret word. Also identify all members of the group who have contributed to your solution of this problem. I would prefer e-mail because the earliest solution wins. Please give me a short description of the process you used.
Comment I would use Maple of course, and have Maple create and remember certain polynomials for me and do all of the arithmetic.

Thursday,
September 4 Introduced myself, and discussed secret sharing (the beginning of the Gov's School notes). I also sent e-mail to the whole class. It contained references to Adi Shamir's original secret sharing paper and to some lecture notes on secret sharing from a computer science course.
I gave out an information sheet.

Maintained by greenfie@math.rutgers.edu and last modified 9/8/2003.