Automatic Proofs of Asymptotic ABNORMALITY (and much more!) of Natural Statistics Defined on Catalan-Counted Combinatorial Families

By Shalosh B. Ekhad and Doron Zeilberger

.pdf   .ps   .tex

(Exclusively published in the Personal Journal of Shalosh B. Ekhad and Doron Zeilberger and arxiv.org)

Written: March 21, 2014

Last Update: March 31, 2014 [Thanks to Cheyne Homberger]

The famous Catalan Numbers count zillions of combinatorial families (see Richard Stanley's compendium), and humans have fun trying to find `nice' bijections between family A and family B. While this may be fun for a while, sooner or later this game gets old, especially, since the real reason Catalan numbers are so ubiquitous is their simplicity (see DZ's 49-th opinion), and that humans can only grasp simple things.

According to that opinion, the reason for the ubiquity of the sequence of Catalan numbers, let's call them c(n), is that their generating function

C(z):=∑n ≥ 0 c(n)zn

satisfies the SIMPLEST possible (non-linear) algebraic equation

C(z)=1+zC(z)2   ,

that is equivalent to the quadratic recurrence

c(n)=∑1 ≤ k ≤ n c(k-1)c(n-k)   ,   c(0)=1   .

Often, the members of the combinatorial family in question posses natural `statistics', for example for Dyck paths, the number of `inversions' (U (not necessarily immediately) ahead of D), or for 132-avoiding permutations, the number of occurrences of a given pattern, then it may happen that two different statistics amazingly have the same average! Wow! Let's find a bijection! (see e.g. this humanly-generated paper that appeared in the very prestigious and very selective Electronic journal of Combinatorics, that only accepts "the best papers".)

Humans can, with some effort, find closed-form expressions for the average (aka expectation, aka first moment), and if they try really hard, may be able to find the variance (aka second moment), but beyond that they should enlist their much superior silicon brethrern, and develop algorithms for discovering (and proving!) closed-form expressions for as many as possible moments. In addition to its intrinsic interest, it would also indicate whether the combinatorial statistic (`random variable') in question seems to be `asymptotically normal' (if the standardized moments, starting with the third, converge, as n goes to ∞, to 0,3,0,15,0,105,.. the famous moments of the normal distribution), or whether it is DEFINITELY not normal (if the expression for the skewness (aka as standardized 3rd moment) does not tend to zero, we are done!)

In this collaboration between a human (DZ) and computer (SBE) we do just that. The human wrote a Maple package, that once and for all can handle zillions of possible statistics defined on Catalan-families, and surprise-surprise, prove (rigorously!) asymptotic ABNORMALITY, by deriving (and proving!) closed-form expressions for the skewness, that turn out NOT to tend to zero, and at the same time derive both exact (and hence asymptotic) expressions for many higher moments.

It is true that, with great effort, by very smart humans, like Svante Janson, (e.g. this nice human-generated article) can do it by entirely human means, but they can only do the leading asymptotics! Not even Svante Janson can find, just by hand, e.g., an EXACT closed-form expression, in n, for the sixth moment of the random variable `number of occurrences of the pattern 213' in the set of 132-avoiding permutations.

But do we really care about the 6th moment of some stupid statistic defined on some stupid family of sets? Of course not! The Medium is the Message! This article is but a case-study in human-computer collaboration. The human teaching the computer how to do every conceivable problem in a wide class of combinatorial problems, by the human designing algorithms, then implementing them, and then letting the computer execute them. Once we get better and better at this kind of collaboration, we would be ready for the big time! Stand by (in 100 years or less) for a computer-generated proof of RH and NP ≠ P.

Added March 31, 2014: Cheyne Homberger just alerted me that the \$100 conjecture (due to Kate Rudolph) that I made in the paper has already been solved (published a few days before this article) by Lynn Chua and Krishanu Roy Sankar, see their nice paper (that unfortunately was not in arxiv.org, or else we would have probably seen it). I have just made the donation, see here.

# MAPLE PACKAGES

• AlgFunEq, a Maple package that automatically handles algebraic functional equations for generating generating functions for weight-enumerators for many Catalan-counted combinatorial families, according to many types of statistics, and automatically derives from them rigorously-proved explicit expressions for the average (first moment), of course, but also of many higher moments, in particular enabling the rigorous proof of asymptotic abnormality. It is particulary focused on 132-avoiding permutations, but the procedures that process algebraic functional equations have a much larger scope.

• Cheyne, to handle 123-avoiding permutations, significantly extending the nice work of Cheyne Homberger.

# Sample input and output for the Maple package AlgFunEq

• If you want to see how Shalosh B. Ekhad found rigorously-derived closed-form expressions for the "total number of occurrences of a pattern p amongst all 132-avoiding permutations of length n" for ALL patterns p of length up to 8, (BTW, the human Miklos Bona was only able to do patterns of length ≤ 3 in this paper, that was published in the very prestigious and very selective Electronic journal of Combinatorics).
the input   yields the output

• If you want to see how Shalosh B. Ekhad found rigorously-derived closed-form expressions for the "total number of occurrences of pattern p amongst all 132-avoiding permutations of length n" for ALL patterns of length 9,
the input   yields the output

• If you want to see how Shalosh B. Ekhad found rigorously-derived closed-form expressions for the "total number of occurrences of pattern p amongst all 132-avoiding permutations of length n" for ALL patterns of length 10,
the input   yields the output

• If you want to see how Shalosh B. Ekhad found rigorously-derived closed-form expressions for the first six moments (NOT about the mean) of the random variables

"total number of occurrences of a pattern p amongst all 132-avoiding permutations of length n"

for ALL patterns p of length 2 and 3 (BTW, the human Miklos Bona was only able to do the first moment in this paper, that was published in the very prestigious and very selective Electronic journal of Combinatorics).
the input   yields the output

• If you want to see how Shalosh B. Ekhad found asymptotic expressions (in n, or rather sqrt(n) that it abbreviates m), for the standardized moments through the sixth, for the random variable "number of occurrences of a given pattern amongst 132-avoiding permutations of length n" for ALL patterns of lengths 2 and 3
the input   yields the output

• If you want to see how Shalosh B. Ekhad found explicit generating functions for the sequence of factorial moments of the above-mentioned random variables
the input   yields the output

# Sample input and output for the Maple package Cheyne

• If you want to see how Shalosh B. Ekhad found the sequence of weight-enumerators of 123-avoiding permutations according to the weight
tNumberOfOccurrencesOfThePattern213
on the set of 123-avoiding permutations of length n from n=0 to n=35,
the input   yields the output

• If you want to see how Shalosh B. Ekhad found entirely empirically, all the `Bona classes' (i.e. all sequences of "total number of occurrences of a pattern") amongst 123-permutations of length up to n=9, for all patterns of length from 1 to 6,
the input   yields the output

• If you want to see how Shalosh B. Ekhad found empirically-yet-rigorizably, for the random variable

"number of occurrences of the pattern 213 in the set of permutations of length n avoiding the pattern 123",

the first moment (times the Catalan numbers C(n)) [reproducing much faster one of the main results in Cheyne's Homberger's article], the second moment (times the Catalan number C(n)), and a linear recurrence equation of order 4 with polynomial coefficients of degree 4 for the third moment (times C(n))
the input   yields the output

• If you want to see how Shalosh B. Ekhad empirically found the average, variance, and the standardized moments up to the 6th for for the random variable

"number of occurrences of the pattern 213 in the set of permutations of length n avoiding the pattern 123",

for n from 0 to 35,
the input   yields the output

indicating that it is asymptotically ABNORMAL (since the skewness does not seem to tend to 0, neither the kurtosis to 3. BTW, with some effort, this can be proved rigorously, but we have better things to do!)

Personal Journal of Shalosh B. Ekhad and Doron Zeilberger