IISER Mohali, Knowledge city, Sector 81, SAS Nagar, Manauli PO 140306

Forthcoming Events

Solving the Classic Distinct Element Problem with a Strikingly Simple Algorithm that Captivated Donald Knuth

Prof Sourav Chakraborty (ISI Kolkata)

Location : AB2-5B
Abstract: We will present a very simple streaming algorithm on F_0 estimation that also caught the eye of Donald E. Knuth. In a recent article, Donald E. Knuth started with the following two paragraphs: " Sourav Chakraborty, N. V. Vinodchandran, and Kuldeep S. Meel have recently proposed an interesting algorithm for the following problem: A stream of elements (a1, a2,...,am) is input, one at a time, and we want to know how many of them are distinct. In other words, if A = {a1, a2,...,am} is the set of elements in the stream, with multiplicities ignored, we want to know |A|, the size of that set. But we don’t have much memory; in fact, |A| is probably a lot larger than the number of elements that we can hold in memory at any one time. What is a good strategy for computing an unbiased estimate of |A|?

Their algorithm is not only interesting, it is extremely simple. Furthermore, it’s wonderfully suited to teaching students who are learning the basics of computer science. (Indeed, ever since I saw it, a few days ago, I’ve been unable to resist trying to explain the ideas to just about everybody I meet.) Therefore I’m pretty sure that something like this will eventually become a standard textbook topic. This note is an initial approximation to what I might write about it if I were preparing a textbook about data streams."

Go to top