A Quick Empirical Reproof of the Asymptotic Normality of the Hirsch Citation Index (First proved by Canfield, Corteel, and Savage)

By
Shalosh B. Ekhad and Doron Zeilberger

.pdf   .ps   .tex

First Written: Oct. 31, 2014

[Exclusively published in the Personal Journal of Shalosh B. Ekhad and Doron Zeilberger and arxiv.org]

Once upon a time there was an esoteric and specialized notion, called "size of the Durfee square", of interest to at most 100 specialists in the whole world. Then it was kissed by a prince called Jorge Hirsch, and became the famous (and to quite a few people, infamous) h-index, of interest to every scientist, and scholar, since it tells you how productive a scientist (or scholar) you are! When Rodney Canfield, Sylvie Corteel, and Carla Savage wrote their beautiful 1998 article proving, rigorously, by a very deep and intricate analysis, the asymptotic normality of the random variable "size of Durfee square" defined on integer-partitions of n (as n goes to infinity), with precise asymptotics for the mean and variance, they did not dream that one day their result should be of interest to everyone who has ever published a paper.

However Canfield et. al. had to work really hard to prove their deep result. Here we take an "empirical" shorcut, that proves the same thing much faster (modulo routine number- and symbol-crunching). More importantly, the empirical methodology should be useful in many other cases where rigorous proofs are either too hard, or not worth the effort!

Note: Our "empirical" approach is still based on purely mathematical data. Alexander Yong's wonderful critique, (that inspired the present note), is a masterpiece of (sociological) "meta-mathematics", and uses actual, real-world data! It is strongly recommended.

Added Nov. 13, 2014: watch the lecture (produced by Matthew Russell):
• vimeo version: part 1   part 2 (uploaded by Matthew Russell)
• youtube version: part 1   part 2 (uploaded by by Edinah Gnang)

Maple Package

• HIRSCH: a Maple package to generate the combinatorial (and hence probability) generating functions for the random variable "h-index" defined over the set of all citation profiles with a given number of citations. It also carries statistical analysis, and emprically proves asymptotic normality and concentration of measure.

Some Input and Output files for the Maple package HIRSCH

• If you want to see a list of length 6400, whose i-th item is the PROBABILITY generating function for the random variable 'h-index' (alias size of Durfee square) with i citations

the input   yields the output

Note: we called this list Data80, so that anyone can play with it. It is given in Maple input notation.

• If you want to see a list of length 100, whose i-th item is the COMBINATORIAL generating function for the random variable 'h-index' (alias size of Durfee square) with i2 citations for i from 1 to 100,

the input   yields the output

Notes: 1. We called this list H100, so that anyone can play with it. It is given in Maple input notation. 2.to get the probability generating function, divide by its value at t=1, i.e. by the number of integer-partitions of i2.

• If you want to see, an empirical (BUT VERY CONVINCING) proof of the asympototic normality, and Canfield's concentration phenomenon (that Alexander Yong erroneusly called `conjecture')

the input   yields the output

Personal Journal of Shalosh B. Ekhad and Doron Zeilberger