Most database management systems maintain statistics on the underlying relation. One of the important statistics is that of the "hot items" in the relation: those that appear many times (most frequently, or more than some threshold). For example, end-biased histograms keep the hot items as part of the histogram and are used in selectivity estimation. Hot items are used as simple outliers in data mining, and in anomaly detection in networking applications.
I will describe some novel solutions to this problem based on viewing this
as a group testing problem. We give approaches using both adaptive and
non-adaptive group testing. These algorithms maintain a small space data
structure that monitors the transactions on the relation, and when
required, quickly output all hot items, without rescanning the relation
in the database. With user-specified probability, it is able to report all
hot items. I will then go on to describe some work in progress on
extensions to this model, when we wish to find items whose frequency has
changed by a significant amount, either in absolute or relative terms; and
also finding related items which together are "hot" but individually are
not.