Tweet

Cryptanalysis of the NFL Schedule

Stephen D. Miller

Rutgers University

 

January 12, 2016

 

This is a companion piece to my interview with John Urschel in the February 2016 Notices of the American Mathematical Society.  It aims to explain in everyday terms how some principles of codebreaking (or “cryptanalysis” as it is called by those in the community) can be applied to see a bias in a familiar scheme: the NFL schedule.  We’ll see that in certain years (including 2015), the NFL schedule has an unexpected quirk that might explain why the New England Patriots and Carolina Panthers had such strong seasons.  I’ll also explain how to use mathematics to fix the problem, and touch upon some exciting current mathematical topics such as the “Graph Isomorphism Problem”.

 

Codebreakers often find themselves face-to-face with a complicated, impenetrable-looking cryptosystem that seems to offer no hint as to how to attack it.  Usually they fail, but occasionally careful scrutiny will reveal a small design flaw that causes a pattern to repeat itself more often than it probably should.  Alternatively, there might be an unintended, very small lack of randomness that biases the encryption a particular way.  Once that small crack in the ice is taken advantage of, a fortunate attacker can sometimes leverage it to defeat the entire cryptosystem.  A classic example is the “Cryptogram” found in newspaper comics sections, in which each letter of the alphabet is encoded as a different letter (these are also known as substitution ciphers). The English language uses letters like “e” and “i” more frequently than “j” and “x”, and this bias makes it possible to crack such codes even as a recreational hobby.

I’ll explain a toy example of the cryptanalysis methodology applied to the NFL schedule.  Of course the NFL schedule is not cryptic: everyone knows what it is in advance, and so the allusion to cryptography is merely an analogy.  We will see that in certain years ­-- including 2015 -- the schedule has very different features.  (For a peek, see Figures 1 and 6 below.)

The NFL Schedule rotation:  The 32 NFL teams are organized into two conferences (AFC and NFC), each of which has 4 divisions of 4 teams apiece.  Each team plays 16 games a season.  14 of those games are against rivals that are determined far in advance by a schedule rotation.  (It is this rotation which has the flaw I alluded to above.)  Here’s a schematic indicating who those 14 games are against:

6 games within the division (each of the 3
 divisional rivals is played twice)

All 4 teams in a different division within the same conference

All 4 teams in a division of the other conference

2 equally ranked teams in the same conference

 

Every team in a given division has a common list of opponents in 14 games.  They play every other team their own division, every team in another division in the conference, and every team in another division in the other conference.  The key point is that a team plays only 2 other games (marked in blue) aside from these set divisional matchups.  Those 2 games, which are played against other conference rivals, are against teams which finished the previous season in the same place in the standings.  Thus a team that finishes in 3rd place in its division will play all other 3rd place finishers in its conference the following year (one of these is already included in the green grouping). 

In 2015 the NFL is arranged into geographic regions more than it is into conferences.

 

In order to allow teams to play each other on a consistent basis, the NFL rotates the green and red parts around so that the divisions a team plays will vary from year to year.  Being that there are 4 divisions in each conference, the red part (the division from the other conference) changes from year-to-year on a 4-year cycle; the green part (the opposing division in the same conference) varies on a 3-year cycle, since of course there are only 3 other divisions in the same conference.  Combining, the NFL schedule rotation has a 4x3=12 year cycle, meaning that the list of 14 common opponents repeats itself every 12 years.

What’s the problem?  In setting up the rotation (which you can find neatly described here), the NFL was not careful about setting up the 4-year cycle of divisions from the other conference (marked in red).  They also apparently failed to take into account different possibilities which would have worked better (I’ll suggest improvements below, e.g., in Figure 3).  It could be there was an excellent reason behind this, but nobody I’ve spoken to is aware of one.  I am also unaware of any other league where the schedule structure varies so drastically in different years.  This causes an unexpected and unintended pattern which I’ll now describe.

As we just saw, 14 out of 16 of the opponents a team plays are determined by the schedule rotation, which is purely dependent on their division.  Here’s a schematic indicating the divisions that play each other in the 2015 season:

 

The North-West clusterThe South-West cluster

Figure 1: The North-West and South-East clusters in the 2015 NFL schedule.

Here I’ve connected two divisions with an arc if they play each other.  There is a very striking, unexpected feature here: there are 2 different clusters in this picture.  This means that, aside from the 2 games in blue against teams that finished equally ranked in the 2014 standings, there are no games between teams on the left and teams on the right.  In particular, 87.5% of all games are internal to these separate clusters: in 2015 teams in the East and South play very few games against teams in the North and East, which is not at all the case for 2016 when the schematic will look like this:

The 2016 schedule has just a single cluster

Figure 2: The 2016 schedule has just 1 cluster (unlike 2015's).

This 2016 diagram doesn’t isolate divisions into separate clusters.  In fact, in the full 12 year cycle, only 2 years (2012 and 2015, for example) have two disconnected clusters, whereas the other 10 years have diagrams like 2016’s which form one big circle (see Figure 3).

Put differently, it is well-known that AFC and NFC teams do not play each other often: only 64 of the 256 games each year are interconference games.  However, if we reorganize the NFL into the two different 16-team clusters in each of the two circles (i.e., the 16 teams which are in the South and East divisions of either conference, and the 16 teams which are in the North and West divisions of either conference), there is a staggeringly-low total of only 32 games played between them!  Thus in 2015 the NFL is arranged into geographic regions more than it is into conferences.

In most other years, this common concentration of weak opponents would not happen.

 

What’s the practical difference?  It furthermore happens to be that the North-West cluster is considered much stronger than the South-East cluster.  For example, all wild-card playoff teams from 2014 and 2015 are in the North-West cluster.  The North-West cluster went 21-11 against the South-East cluster in 2015 (and so far, has gone 2-0 in the playoffs).  Even as late as week 15 in the season, the South-East cluster had only 3 teams with a winning record.  One of those is the New England Patriots, who began the season 10-0, and another is the Carolina Panthers (who began 14-0).  The Panthers get to not only play in a division with no other winning team, they play 4 games against the NFC East and 4 games against the AFC South, neither of which had a winning team until late in the year.  Because of the schedule quirk, the Patriots also play these same relatively-weak teams in the NFC East and AFC South.  In most other years, this common concentration of weak opponents would not happen. 

Can it be fixed?  Yes, though of course any changes would mean temporarily interrupting the current schedule rotation.  There are many ways to do it, all of which are mathematically similar.  Here’s a particular solution which involves only a slight modification of the current rotation:

My proposed schedule rotation

Figure 3: My proposed schedule rotation, which eliminates the possibility of clusters that appear in the 2015 schedule (as in Figure 1).

The rotation repeats every 12 years, i.e., 2029 is like 2017, 2030 is like 2018, etc..  I’ve kept the previous coloring scheme with green referring to divisional pairings within a conference, and red for those with the other conference.  The changes here are minimal: the green part (within the conference) was not changed at all, and the only changes in matchups between different conferences involve a few switches involving the NFC North and South.

If this switch is adopted, then (due to the transition) teams in the AFC South would play NFC North teams in 2019, only 3 years after playing them in 2016 (see Figure 2); they would not play NFC South teams until 2020, 5 years after having played them in 2015.  This a slight deviation from the usual 4 year cycle, but only a temporary one: the 4 year cycle would resume again after this.  Aside from similar changes for AFC West teams against the NFC North and South, this is the only change I am suggesting.  However, it’s adequate to completely eliminate the clustering present in the 2015 schedule (Figure 1).

A Sudoku-style method of finding a schedule without clusters.  Having just given a schedule fix, let’s now see a fun way to understand why it’s possible to find one.  We’ll convert the problem into one similar to filling out a Sudoku grid.  In order to be flexible and not identify any particular divisions, let’s number the divisions AFC 1, AFC 2, AFC 3, AFC 4, and NFC 1, NFC 2, NFC 3, NFC 4.  For the sake of argument, let’s start this new system in 2017.  The following charts show

2017 divisional matchups2018 divisional matchups2019 divisional matchups

Figure 4: The proposal for matchups between divisions in each conference

the matchups between different divisions in the same conference.  The pattern repeats every 3 years, so that 2020 has the same diagram as 2017, etc..  Since I have not specified which divisions are which, these diagrams in fact account for all possibilities.  (The NFL actually uses such a scheme at present, though this is not important; it’s more convenient to assign the divisions numbers than names in this explanation.) 

We now turn to the 4-year rotation of divisions in different conferences, which I’ll explain using a 4x4 grid

 

Year 1 (2017)

Year 2 (2018)

Year 3 (2019)

Year 4 (2020)

AFC 1 plays NFC _

 

 

 

 

AFC 2 plays NFC _

 

 

 

 

AFC 3 plays NFC _

 

 

 

 

AFC 4 plays NFC _

 

 

 

 

 

that I’ll have to fill in with numbers 1,2,3,4 to tell you which NFC divisions the AFC divisions will play.  I need to make sure that each column uses the numbers 1,2,3,4 exactly once apiece (otherwise, two AFC divisions plays the same NFC division, while some NFC division plays no AFC division).  Also, the NFL requires all divisions to play each other within a 4 year cycle; this means that each row uses the numbers 1,2,3,4 exactly once apiece. 

Filling out the grid is now like filling out a Sudoku puzzle.  However, I am going to add a further stipulation that is not covered by the previous paragraph: I’ll insist the AFC 1 plays NFC 1 in year 1, AFC 2 plays NFC 2 in year 2, etc.:

 

Year 1 (2017)

Year 2 (2018)

Year 3 (2019)

Year 4 (2020)

AFC 1 plays NFC _

1

 

 

 

AFC 2 plays NFC _

 

2

 

 

AFC 3 plays NFC _

 

 

3

 

AFC 4 plays NFC _

 

 

 

4

 

This will be the key to explaining why the eventual solution does not produce clusters in the schedule rotation.  For now, note that because each number will occur exactly once in each row, it will never happen that (aside from these diagonal entries I just marked) AFC division 1 plays NFC division 1, AFC division 2 plays NFC division 2, AFC division 3 plays NFC division 3, or AFC division 4 plays NFC division 4.

Our task is to fill in the blank spots with numbers 1,2,3, or 4, so that each row contains each number exactly once, and each column contains each number exactly once.  To do this, let’s look at the bottom entry in the 2019 column (which I’ve highlighted in the following grid).  It can’t be a 3, since there’s already a 3 in that column.  It can’t be a 4, since there’s already a 4 in that row.  It must be a 1 or a 2 -- either will work, so let’s make it a 1:

 

Year 1 (2017)

Year 2 (2018)

Year 3 (2019)

Year 4 (2020)

AFC 1 plays NFC _

1

 

 

 

AFC 2 plays NFC _

 

2

 

 

AFC 3 plays NFC _

 

 

3

 

AFC 4 plays NFC _

 

 

1

4

 

Let’s now consider the second entry in the 2019 column, which cannot be a 2 (since there’s already a 2 in row 2), nor a 1 or a 3 (which are already in the column).  It therefore must be 4, which in turn forces the first entry in the column to be 2 (the only unused number):

 

Year 1 (2017)

Year 2 (2018)

Year 3 (2019)

Year 4 (2020)

AFC 1 plays NFC _

1

 

2

 

AFC 2 plays NFC _

 

2

4

 

AFC 3 plays NFC _

 

 

3

 

AFC 4 plays NFC _

 

 

1

4

 

The first entry in the 2nd row must be 3 (since 1 is already in its column, while 2 & 4 are already in its row).  Therefore, the last entry in that row must be 1:

 

Year 1 (2017)

Year 2 (2018)

Year 3 (2019)

Year 4 (2020)

AFC 1 plays NFC _

1

 

2

 

AFC 2 plays NFC _

3

2

4

1

AFC 3 plays NFC _

 

 

3

 

AFC 4 plays NFC _

 

 

1

4

 

Now most of this grid is filled out.  The other entries are easy to deduce in a similar fashion, and the final answer is as follows:

 

Year 1 (2017)

Year 2 (2018)

Year 3 (2019)

Year 4 (2020)

AFC 1 plays NFC _

1

4

2

3

AFC 2 plays NFC _

3

2

4

1

AFC 3 plays NFC _

4

1

3

2

AFC 4 plays NFC _

2

3

1

4

 

We will use this grid not just for years 1,2,3, and 4, but furthermore extend it in perpetuity by repeating it every 4 years.  Incidentally, had we used the initial choice of 2 (instead of 1) when we started filling out the bottom entry of the 2019 column, we would have instead obtained the transpose of this grid (meaning what you get if you reflect it by a mirror on its diagonal).  These are the only two solutions.

Why does this remove clusters?  The grid is set up so that in a given year, there is precisely one number between 1 and 4, call it n, for which AFC division n plays NFC division n.  Because the AFC and NFC sides of each diagram in Figure 4 are identical, there is a different number, m, such that AFC division n plays AFC division m, and NFC division n plays NFC division m.  (That is, the division they play in their own conference has a common number.)  We constructed the grid so that AFC division m does not play NFC division m.  This means that whatever cluster AFC and NFC divisions n and m are in, there must be at least two other divisions involved, and so this cluster must have at least 6 of the 8 NFL divisions in it.  Could there be a division that’s not in this cluster?  Such a division must play 2 other divisions (one in its conference, one in the other conference), and so it would be part of another cluster having at least 3 divisions.  That would leave us with both our first cluster of at least 6 divisions, and this second one of at least 3 divisions -- none of which overlap.  That makes for at least 9 divisions, whereas the NFL has only 8 divisions.  We conclude from this contradiction that there can only be one cluster of teams (so that the diagram looks like Figure 2, though with different divisions labeling the boxes)

Some extra math – spectral graph theory: The way I noticed the 2015 NFL schedule quirk was in making an illustration for my interview with John Urschel in the February 2016 Notices of the AMS.  This picture

Graph from Notices article illustrating the Urschel-Zikatanov Theorem

Figure 5: Graph from Notices article illustrating the Urschel-Zikatanov Theorem.

represents each of the 32 teams by a circle, and connects them by lines if the teams play each other in 2015.  Here’s how it looks in other years, with the circles shrunk to dots:

NFL schedule graphs under current system

Figure 6: Schematic of the NFL schedule graphs from 2010-2021.  The diagrams repeat every 12 years, e.g., the picture for 2022 is identical to that for 2010.

Mathematicians call a network of lines connecting dots such as this a graph. The coloring in Figure 4 is in terms of the lowest nontrivial Laplace eigenfunction for the graph (see the sidebar in the article for more technical information).  It naturally selects clusters with few edges between them; in this case, it found the South-East and North-West clusters described above.  “Spectral partitioning of graphs”, as this is called, is a fundamental tool in analyzing data sets.  It finds patterns that might not be readily seen from other points of view (for more information, click here).

Figure 5 has 12 different looking pictures, but actually not all of them are really different mathematically.  For example, it’s not difficult to see that the 2012 and 2015 entries are extremely similar to each other: it is possible to reposition the dots (dragging their attached lines with them) to make them look exactly like each other.  In mathematical parlance, these graphs are called “isomorphic”, meaning they have the exact same shape (they only appear different if the dots are put on paper in different places).

What’s less obvious is that the other 10 diagrams are isomorphic to each other (but not to the 2012 or 2015 diagrams).  The task of determining whether or not two graphs are isomorphic is the famous “graph isomorphism problem” in theoretical computer science.  It can always be solved by brute force, but there are clever ways to vastly speed it up.  Recently Laszlo Babai of the University of Chicago announced an important breakthrough on this problem.

Another important mathematical feature is mixing time, which roughly speaking measures how long one has to walk around randomly along the lines between the dots before they’ve lost any trace of where they started.  Imagine that an army of ants emerges from a particular dot, and starts diffusing by walking along the lines connecting the dots to each other, making random decisions as to where to go next.  In the 2015 graph in Figure 5, the odds are not good that many ants will quickly make it across the relatively small number of lines bridging the two sides of the diagram, meaning that the mixing time is high.  However, in the 2016 graph the ants would diffuse faster.  This is measured by a statistic called the lowest graph eigenvalue, which is usually 6 for the graphs in Figure 6, but dropped to 4 in 2012 and 2015.  The higher this statistic, the more random looking the graph is. 

A mathematically related concept is that the 2012 and 2015 diagrams are easier to disconnect into 2 pieces (by cutting a small number of lines) than the other diagrams.  This is why New England and Carolina were shielded from having to play so many strong teams on the righthand side of the picture in Figure 5, and why the South-East cluster as whole had a higher winning percentage than it would have had if it had more exposure to the strong teams in the North and West.  The relationship between mixing time, eigenvalues, and disconnection appears prominently in the related mathematical subject of expander graphs.

Finally, let’s conclude with cryptanalysis.  Low mixing time is desirable in a cryptosystem, because it generally means an attacker has less information about the state of a system.  In terms of the ants, the faster they can spread around, the less information we have about where they first emerged from.  This is an important design feature in making stream ciphers (which are fast, secure random generators), and spectral graph theory is an important tool in analyzing cryptographic constructions because of its ability to find hidden patterns. 

Thanks: to John Urschel, Peter Sarnak, Keith Frankston, Michael Fruchter, and Nathan Keller for their helpful comments. 

 

Stephen D. Miller is professor and vice-chair of Mathematics at Rutgers University.  He can be reached at miller@math.rutgers.edu.  His research focuses on number theory (including the underlying mathematics of cryptography) and is supported by National Science Foundation grant CNS-1526333.