Department of Mathematics, Temple University, Philadelphia, PA 19122, USA.
Exclusive to the Personal Journal of Zeilberger and Ekhad.
Written: Feb. 16, 1999.
``All human beings carry about a set of words which they employ to justify their actions, their beliefs, and their lives. ... I shall call these words a person's `final vocabulary'''---Abstract: We describe a Maple package for analyzing and comparing texts, and especially for finding `Final Vocabularies' of texts, by picking those words that appear much more frequently than normally. Using it, we determine the Final Vocabularies of the Unabomber Manifesto, the Starr Report, and, much more importantly, the favorite words of each of the five books of the Torah.
Richard Rorty, [``Contingency, irony, and solidarity'', p. 73]
The versatile Maple system is not normally used to analyze texts. It is more common, and efficient, to use low level programming languages such as C. However, sometimes it is more convenient to use Maple, especially if one is more fluent in it than in other programming languages, and if one wants to modify and have a greater control. Also, one may conceive of symbolic analysis of natural-language texts, in which case it is useful to have everything in Maple.
The Maple package RORTY (named in honor of the great post-modern American philosopher, Richard Rorty), takes text-profiles and analyzes them in various ways. A text-profile is a list of pairs [word,i], where i is the number of times the word word occurs in the original text.
In order to obtain the text profile, one must first download
a text from the web (there are plenty available!), and run
the Unix Shell program
toProfile. This short program is adapted from the code on
p. 107 of Brian Kernighan and Bob Pike's classic text `The Unix
Programming Environment' (Prentice Hall, 1984). Once you downloaded
this program, and called it, say, toProfile, and you want to
convert a text, say, STARR, into a text-profile, simply type:
toProfile STARR > STARRP
Then, use an editor to add the line: `STARR:= ` at the very beginning.
You can repeat this process with as many texts as you want,
and at the end append them all into one file called, TEXTP.
For example, our TEXTP file consists of the text-profiles of the Unabomber Manifesto, that we call UNABOMBER, the Starr report, that we call STARR, and the five books of the Torah (in English).
The next step is to download our Maple package RORTY. Save it as RORTY. Then go into Maple by typing maple (or xmaple, or whatever is applicable). To get a list of the available procedures, type: `ezra();'. To get help with any particular procedure type `ezra(Procedure_Name);' . For example, for help with procedure FavesWords, type `ezra(FavesWords);'.
Because FavesWords is our star procedure, let's explain it here,
even though there is on-line help. The syntax is
FavesWords(textp1,textp2,Ratio,L);
The inputs are text-profiles (that must exist in the data-file
TEXTP), textp1, textp2, a number, Ratio, and an integer L.
The output is the set of words in textp1, amongst the L most
frequent ones, whose frequency in textp1 is at least Ratio times
higher than their frequency in textp2.
Examples: The set of favorite words of the Starr Report (using the Unabomber Manifesto as a reference text), i.e. those words, among the top-100 of the Starr Report, whose frequency is at least ten times their corresponding frequency in the Unabomber Manifesto, is obtained by typing:
FavesWords(STARR,UNABOMBER,10,100);
The output was:
{int, v, t, day, december, m, id, me, according, relationship, her, i, she, gj, ms, at, dc, testified, testimony, deposition, president, lewinsky, mr, s, tripp, gifts, monica, oval, jury, grand, january, asked, depo, jones, office, told, white, house, sexual, said, jordan, clinton, currie}
On the other hand, to get the favorite words of the Unabomber Manifesto, type:
FavesWords(UNABOMBER,STARR,10,100);
The output was:
{system, example, goals, industrial, control, human, small, behavior, man, social, its, need, paragraph, will, problems, needs, our, psychological, technological, power, leftism, leftist, freedom, modern, society, revolution, leftists, technology}
Using procedure FavesWords, but with Ratio=3, we found that the favorite words of each of the five books of Moses are (compared to the whole Torah):
Genesis: {joseph, jacob, abraham, isaac, esau};
Exodus:{gold, egyptians, sockets};
Leviticus: {sin, clean, blood, offer, holy, fat, atonement, unclean, skin, plague, priest} ;
Numbers:{family, tribe, families, those, thousand, levites, numbered};
Deuternomy: {command, possess, commandments, thine};