# Math 640: EXPERIMENTAL MATHEMATICS Spring 2006 (Rutgers University) Webpage

http://sites.math.rutgers.edu/~zeilberg/math640_06.html

Last Update: May 13, 2006.

• Teacher: Dr. Doron ZEILBERGER ("Dr. Z")
• Classroom: Allison Road Classroom Building [Busch Campus], Room 119 [Inside computer lab]
• Time: Mondays and Thursdays , period 3 (12:00noon-1:20pm)
• Textbook: "An Introduction to Bioinformatics Algorithms" by Neil C. Jones and Pavel A. Pevzner, MIT Press (ISBN 0-262-10106-8); handouts about Maple.
• Dr. Zeilberger's Office: Hill Center 704 ( Phone: (732) 445-1326)
• Dr. Zeilberger's E-mail: zeilberg at math dot rutgers dot edu
• Dr. Zeilberger's Office Hours: MTh 10:30-11:00.

## Outline

Experimental Mathematics used to be considered an oxymoron, but the future of mathematics is in this direction. In addition to learning the philosophy and methodology of this new field, students will become computer-algebra wizards, and that should be very helpful in whatever mathematical specialty they'll decide to do research in.

This semester the focus will be on bioinformatics, using the Jones/Pevzner excellent and very readable book. So in addition to becoming Maple whizes, students will learn the fundamentals of bioinformatics, in the best possible way, by programming its algorithms! (recall Knuth's dictum that the best way to understand something is to program it). In addition to its intrinsic interest and beauty, knowing the basics of both Maple programming and bioinformatics is an excellent backup plan for budding academicians.

# Diary and Homework

### Homework for Jan. 19, due Jan. 26:

1. Modify Procedure L(n) of PennyGame to treat the more general game where one has one pile of pennies but in each turn one can remove 1, 2, ..., a pennies from the pile, and the last player wins. Call this procedure L(n,a)
2. Conjecture a prove a criterion for the winning positions of the above more general game, phrased in terms of a.
3. Modify procedure Nim2 in file Nim to write a procedure Nim3(a,b,c) that handles 3-pile Nim with the piles having a, b, and c pennies.
4. 5 dollars: Find a "quick" way to decide whether a position (a,b,c) is losing or not (if you know the answer, or looked it up, you don't qualify).
5. 5 dollars: Find a "quick" way to decide whether (a,b) is a losing position in Whytoff's game that is programmed in Procedure Whytoff(a,b) in Nim (if you know the answer, or looked it up, you don't qualify).
6. Modify procedure Whytoff in file Nim to write a procedure Whytoff3(a,b,c) that handles 3-pile Whytoff, where a legal move consists of removing any number of pennies from ONE of the piles or the SAME amount of pennies from ALL piles.
7. Instant Ph.D.: Find a "quick" way to decide whether (a,b,c) is a losing position in a three-pile Whytoff's game that is programmed in Procedure Whytoff3(a,b,c) that you just wrote.
8. Try to read sections 4.1,4.2,4.3 from the textbook "Bioinformatics Algorithms"

### Programs done in Jan. 23, 2006, class

• PDP, the Partial Digest Problem by brute force.

### Homework for Jan. 23, due Jan. 30:

Last update: Jan. 29, 2006: thanks to Andrew Baxter for correcting errors in the formulation of the problems below in a previous posting.

1. A sequence w[i] (i=1..N) of 0 and 1's is k-bad if you can find k equally-spaced locations: j, j+b, j+2b, ... , j+(k-1)b that have the same letter, i.e. w[j]=w[j+b]=w[j+2b]= ... =w[j+(k-1)b]. A sequence w[i] (i=1..N) is k-good if it is not k-bad. Write a brute-force Maple procedure that inputs N and outputs the set of k-good {0,1}-sequences of length N, lets's call it vdw(N,k).
2. The van der Waerden number W(k) is the smallest N such that vdw(N,k) is empty. Find W(3), and if possible W(4).
3. (\$1000): Prove that W(k) is less than 2**(k**2).

### Programs done in Jan. 26, 2006, class

• PDPold, the Partial Digest Program so far.

### Homework for Jan. 26, due Jan. 30:

1. Finish the program by writing a procedure Descendants(XL,width) that inputs a pair XL=[X,L] eand a number width, and ouputs all the descendants of the form [X1,[]]. Use it to write an efficient version of AntiDelta1,

### Programs done on Jan. 30, 2006, class

• PDP, the Partial Digest Program: final version.
• PROFILEold3, the very beginning of the Profile algorithm.

### Homework for Jan. 30, due Feb. 3:

1. Finish the Cook procedure.
2. Read Sections 4.5, 4.6, 4.7.

## Sudoku contest (due Feb. 13 [extended deadline])

Write a Maple program that solves Sudoku problems on an n**2 time n**2 grid (not just n=3). \$10 price for the fastest running pogram.

Input: a list of lists of size n**2 each of whose entry is a row of length n**2. Each entry is eitehr an integer from 1 to 9 and 0 denots blanks.
Output: The SET of solutions (that should be a singleton for a good puzzle). Each such solution should have no 0s.

Note: Andrew Baxter pointed my attention to the Sudoku site, where you can create puzzles that you test your program with.

Optional Suggested Outline:

1. Write a procedure Options(M,i,j) that inputs a Soduko puzzle (given as a list of length n**2 of the rows, where each row is a list of length n**2, and blanks are denoted by 0), and a location [i,j] (occupied by 0) and outputs the subset of {1,2,3,4,5,6,7,8,9} that can be put there without violating the row- column- and section- condition.
2. Write a procdeure Champion(M) that locates a location [i,j] such that M[i][j]=0 and the size of Option(M,i,j) is non-empty and as small as possible. If it can't find it return FAIL.
3. Write a procedure Children(M) that finds all the matrices obtained by replacing the 0 at location [i,j] (the outout of Champion(M)) by one of the members of Options(M,i,j), where [i,j] is the Champion. Of course Children(M) may be the empty set (if one of the Option(M,i,j) sets is empty)
4. Write a procedure Descendants(M) that inputs a Soduko matrix M and outputs the set of all possible solutions (possibly the empty set). Define it recursively using Children(M), and if M is legal and has no zeros (blanks) left it returns {M}.

Added Feb. 2, 2006: Phillip Matchett Wood kindly found the following Sudoku Challenge, by which we will compare the contesting programs. Follow the instructions in that file to generate the output file with the running time.

Added Feb, 4, 2006: The real test is with four extra problems, taken from the Yedioth newspaper of Feb. 3, 2006.

To participate in the progarm run the TestFile. Your output shuold look like this OutTestFile, but of course with different timings.

Added Feb. 5, 2006: Arvind Ayyer kindly found Three more Sudoku puzzles, for you to test your programs.

Added Feb. 6, 2006: Please put your Sudoku Maple program in the public_html directory of your eden account, as well as the output file.

ADDED Feb. 13, 2006: CONGRATULATIONS TO Bobby Griffin for winning TEN DOLLARS from Dr. Z. for writing the fastest Sudoku program .

### Programs done on Jan. 30, 2006, class

PROFILEold2 [so far]

### Homework for Feb. 2, due Feb. 6:

1. Implement in Maple Procedures NextLeaf, NextVertex, and Bypass in Section 4.7
2. Read Sections 4.7, 4.8, and 4.9

### Homework for Feb. 6, due Feb. 9:

Finish up PROFILE by writing a Maple procedure FindMotif(M,L), that inputs M (a t-list of n-lists in some alphabet), and a positive integer L (L larger than 2), and outputs the following three items:
• The highest scoring motif (a list of length L)
• The t-list that indicates the locations
• The score (a number between .25 and 1)
Mandatory Homework due Feb. 14

HW2_14 by writing a Maple procedure

### Program done on Feb. 9, 2006, class

• PROFILE [the Bad part is still buggy]

### Homework for Feb. 9, due Feb. 16:

2. Using Bypass, write a procedure that skips hopeless parts of the trees by adding to the partial score L*(t-nops(W))
4. [optional] Implemenet the Median String Approach (section 4.6) that searches for all possible words of in {1,2,3,4} of length L and finds the one that best "fits" and the locations.
5. [optional] Do an improved version with Bypass.

### Programs done on Feb. 13, 2006, class

• FLIP [Flipping pancakes (prefix reversal)]
• FLIPa [Sorting by Adjacent Transpositions]
• FLIPb [Sorting by Reversals]

### Homework for Feb. 13, due Feb. 16:

• Write a Maple procedure that inputs a permutation and outputs a sequence of pairs [i,j] that sorts it using the FLIPb flippings. Use the naive approach of the book (looking for the first i that is not equal to pi[i] and bringing it to the i-place)
• Inplement the "clever greedy" algorithm of the book that finds the [i,j] that minimizes the number of breakpoints, if possible, or reverses an increasing block.

### Programs done on Feb. 16, 2006, class

• ReversalGreedy, The Stupid Greedy Reversal sorting algorithm.
• ReversalClever, The "Clever" Greedy Reversal sorting algorithm [not provably an algorithm.]

### Homework for Feb. 16, due Feb. 23:

• Cute Problem from the internet (communicated to me by the great biophysicist Terrell Hill)
• Write another file, ReversalVeryClever, by adapting ReversalClever by adding the condition that whenever all segments between consecutive breakpoints are inceasing, it reverses the first such segment.
• Compare the performance of ReversalGreedy, ReversalClever, and your ReversalVeryClever, by running them on random permutation of length 100 (use randperm of the package combinat).

### Program done on Feb. 20, 2006, class

• NYCold, Find the walk in Manhattan with the largest number of attractions, using complete brute force. [to be continued]

### Homework for Feb. 20, due Feb. 27:

1. Complete BPb(HorizA,VertA,a,b), in NYC, by testing each and every walk of W(a,b) and picking the one that has the maximal weight. The program should return the best walk followed by its weight.
2. Use Dynamical Programming to write a more efficient program. Call it BP(HorizA,VertA,a,b),

### Programs done on Feb. 23, 2006, class

• NYC, Find the walk in Manhattan with the largest number of attractions, using complete brute force (BPb) and Dynamical Programming.
• Feb 23 version of KING, Finds the best walk for Forward-Moving King using both brute force and Dynamical Programming. [still incomplete]

### Homework for Feb. 23, due Feb. 27:

1. Finish up KING, to work for the King's walk problem (right now we only adapted W(a,b) and Wt, do the rest)

### Programs done on Feb. 27, 2006, class

• KING, Finds the best walk for Forward-Moving King using both brute force and Dynamical Programming.
• LCS, Finds the longest common subsequence of two inputted words using KING.

### Homework for Feb. 27, due March 2

1. Write a program GlobalSequenceAllignmen, by adapting LCS, to write a program that inputs an alphabet of n letters and an (n+1) times (n+1) matrix (given as a list of lists) with Blank the (n+1)th letter, signifying the cost (positive or negative), and a pair of words, and finds the best Global allignment.
2. Adapt your GlobalSequenceAllignment above to write a program LocalSequenceAllignment (see section 6.8)
4. Write Maple code that inputs a constant like e, pi, or sqrt(2), and a positive integer n, and finds the smallest prime number in n consecutive digits of that constant.
(a) That runs as fast as possible (b) That has the shortest possible code [5 dollars prizes for the best submission in each category]
5. Try to go to Christos Papadimitriou's fascinating talk on Thurs., March 2, 2006, 11:00pm, Core Auditurim.

### Program done on March, 2, 2006, class

• CHRISTOS, Wasteless prefix-reversal sorting.

### Homework for March 2, due March 9

1. Write a program that inputs a tree (in the awkward data-structre of a list of list of list of ...) in CHRISTOS, write a procedure that inputs such a tree and outputs the set of leaves.
2. Write a program that inputs a tree (in the awkward data-structre of a list of list of list of ...) in CHRISTOS, write a procedure that inputs such a tree and a leaf and outputs the path to the root, as well as the sequence of flip-places from the root to the leaf.
3. Look up Groebner bases, so that you will be ready to Professor Diane Maclagan's fascinating guest lecture.

### Homework for March 6, due March 20

• Do all the homework in Professor Maclagan's handout.

### Program done on March, 9, 2006, class

• EXONb, Exon-chaining by brute-force.
• EXONc, Exon-chaining by Dynamical Programming.

### Homework for March 9, due March 20

1. Read Section 6.14 carefully and write a Maple program for the Spliced Allignment Problem.

### Programs done on March, 20, 2006, class

We took a break from the book, studying some random walk problems.
• ORW, Oriented Random Walk in one dimension.
• ORW2, Oriented Random Walk in two dimensions.

### Homework for March 20, due March 27

1. Generalize ORW2 to k dimensions, where k is arbitrary. The procedure whould be GFk(k,x,t,p) where the variables are x, ..., x[k], variable t takes care of the length, and p is the probability of doing the same kind of step as yesterday, and all other options are equiprobable (with probability (1-p)/(2k-1)).
2. Implement in Maple (using the plot package) Tupper's self-referential formula. Tupper's self-referential formula. [Thanks to Arvind Ayyer for suggesting this].
4. Try to think about a Final project.

### Programs done on March, 23, 2006, class

• Merge Sort (illustrating Divide and Conquer)
• KingWalks, the number of ways for a forward-moving King to walks from the origin to a designated point (m,n), using O(max(m,n)) memory.

### Homework for March 23, due March 30

1. Adapt KingWalks to have a memory-efficient program to compute the length of the longest paths in the edit graph where the input are two words in the alphabet and a table of weights for w[a,b] for each pair of letters a and b and w[a,BLANK], w[BLANK,a] for each letter a.

### Programs done on March 27, 2006, class

• EDIT [Best editing sequence with linear memory usage] [still incomplete]
• MEMORY, "Memory wheel" (a.k.a. deBruijn sequences). [to be continued...]

### Homework for March 27, due March 30

1. Finish up EDIT by implementing the pseudo-code PATH(source,sink) on the top of p. 234.
2. Write a brute-force Maple program that inputs a directed graph [V,E] and outputs a Hamiltonian path if one exists or FAIL. Apply this to G({0,1},3) obtained from MEMORY.

### Programs done on March 30, 2006, class

• MEMORY1 "Memory wheel" (a.k.a. deBruijn sequences), second version, [to be continued...]
• HAMILTON, Hamiltionian paths by brute force.
• EULER, Eulerian paths and cycles [just started]

### Homework for March 30, due April 6

1. Finish up EULER by finishing procedure Fleury (that just got started) that inputs a directed graph and outputs a Eulerian cycle if one exists or returns FAIL. Feel free to introduce as many other procedures as you wish.
NOTE: Proposals for Final Projects due April 15, 2006. The Projects themselves are due May 10, 2006.

### Programs done on April 3, 2006, class

• MEMORY1 : Final Version (includes DeBEfast, uses Fleury's algorithm).
• EULER, Eulerian paths and cycles [final version, bug corrected].

### Homework for April 3, due April 10

1. Modify Fleury(G,u) in EULER, that inputs a directed graph G=[V,E] and a vertex u and outputs a Eulerian cycle that starts and ends at u, to write a procedure FleuryP(G,u,v) that finds a Eulerian path from u to v or returns FAIL.
2. Write a more efficient program for Fleury's algorithm than the one currently in EULER. [Five dollars prize for the most efficient program]
3. Implement Dijkstra's algorithm (described, for example, in Robin Wilson's book "Graph Theory", 4th edition, pp. 38-39) for finding the shortest path from vertex a to vertex b in a weighted directed graph given as [V,E] where each member of E is a triplet [u,v,weight] meaning there is an edge from u to v of weight weight.
4. Modify it to find the shortest "alternating path" which alternatively goes with the arrow and against the arrow.
5. Use it to solve the following classical riddles:
• How can Cabbage, Wolf, Sheep and Man cross a river with a small rowboat that can only take one or two passengers and only the man knows how to row.
• Ditto for 3 missionaries and 3 cannibals (version 1: they all know how to row, version 2: only one cannibal does)
• How can four people who can walk through a dark narrow tunnel (that has room for at most two people at a time) in 1,2,4,5 minutes, do it if the battery of the flashlight will only last 12 minutes?

### Programs done on April 6, 2006, class

• GREP : Searching for a pattern in a text.
• RandTreeBaxter, Andrew Baxter's program for generating a general random tree.

### Homework for April 6, due April 10

1. Write a program that inputs a set of keywords and outputs a keyword tree.

### Homework for April 10, due April 17

1. Write StupidViterbi, and if possible Viterbi (look at Ch. 11 of the book).

### Program done on April 17, 2006, class

• HMM, Hidden Markov Model and Viterbi's Algorithm.

### Homework for April 17, due April 20

1. Write Backwards1 (as analog of Forward1) and a program to find the probability distribution of the states at any time i, given the vector of outcomes x.

### Programs done on April 20, 2006, class

It was a beautiful day so we did the algorithms for Hierarchical Clusters with pencil and paper sitting under a real tree.

### Homework for April 20, due April 27

1. Implement in Maple HierarchicalClustering on p. 345 of the book

### Homework for April 24, due May 4

1. Finish CLIQUE, by writing a program that inputs a simple graph and outputs a graph that is a disjoint union of cliques and is as "close" as possible to the original graph.

### Program done on April 27, 2006, class

• CLIQUE, The Brute Force Corrupted Clique [finished]
• PCC, The PCC algorithm Jones-Pevzner, p. 352 [still buggy and just started. to be continued].

### Homework for April 27, due May 1, 2006

1. SetP of PCC, does not run. Please write a good program.

### Program done on May 1, 2006, class

It was again a nice day, so we used our natural computers (a.k.a. human brains) to study the AdditivePhylogeny algorithm described on p. 364.