A Quick Empirical Reproof of the Asymptotic Normality of the Hirsch Citation Index (First proved by Canfield, Corteel, and Savage)
By
Shalosh B. Ekhad and Doron Zeilberger
.pdf
.ps
.tex
First Written: Oct. 31, 2014
[Exclusively published in the Personal Journal of Shalosh B. Ekhad and Doron Zeilberger and arxiv.org]
Once upon a time there was an esoteric and specialized notion, called
"size of the Durfee square",
of interest to at most 100 specialists in the whole world. Then it was
kissed by a prince called Jorge
Hirsch, and became the famous (and to quite a few people, infamous) h-index, of interest to
every scientist, and scholar, since it tells you how productive a scientist (or scholar) you are!
When Rodney Canfield, Sylvie Corteel, and Carla Savage wrote their beautiful 1998 article
proving, rigorously, by a very deep and intricate analysis, the asymptotic normality of the random
variable "size of Durfee square" defined on integer-partitions of n (as n goes to infinity), with precise
asymptotics for the mean and variance, they did not dream that one day their result should be of interest
to everyone who has ever published a paper.
However Canfield et. al. had to work really hard to prove their deep result. Here we take
an "empirical" shorcut, that proves the same thing much faster (modulo routine number- and symbol-crunching).
More importantly, the empirical methodology should be useful in many other cases where
rigorous proofs are either too hard, or not worth the effort!
Note:
Our "empirical" approach is still based on purely mathematical data.
Alexander Yong's wonderful
critique,
(that inspired the present note), is a masterpiece of (sociological) "meta-mathematics", and uses actual, real-world data! It
is strongly recommended.
Added Nov. 13, 2014: watch the lecture (produced by Matthew Russell):
Maple Package
-
HIRSCH:
a Maple package to generate the combinatorial (and hence probability) generating
functions for the random variable "h-index" defined over the set of all citation profiles
with a given number of citations. It also carries statistical analysis, and emprically
proves asymptotic normality and concentration of measure.
Some Input and Output files for the Maple package HIRSCH
-
If you want to see a list of length 6400, whose i-th item is the PROBABILITY generating
function for the random variable 'h-index' (alias size of Durfee square) with i citations
the input yields
the output
Note: we called this list Data80, so that anyone can play with it. It is given in Maple input notation.
-
If you want to see a list of length 100, whose i-th item is the COMBINATORIAL generating
function for the random variable 'h-index' (alias size of Durfee square) with i2 citations
for i from 1 to 100,
the input yields
the output
Notes: 1. We called this list H100, so that anyone can play with it. It is given in Maple input notation.
2.to get the probability generating function, divide by its value at t=1, i.e.
by the number of integer-partitions of i2.
-
If you want to see, an empirical (BUT VERY CONVINCING) proof of the asympototic normality,
and Canfield's concentration phenomenon (that Alexander Yong erroneusly called `conjecture')
the input yields
the output
Personal Journal of Shalosh B. Ekhad and Doron Zeilberger
Doron Zeilberger's Home Page