03/04/2014 - 12:15pm

03/04/2014 - 1:10pm

Speaker:

Mark Huber (Claremont McKenna College)

Abstract:

Consider the following problem. You have a stream of random variables $X_1,X_2,...$ with expected value $\mu$ and standard deviation $\epsilon \mu$. You want to use these random variables to build an estimate for $\mu$ that lies in the target interval $[(1 - \epsilon)\mu,(1 + \epsilon)\mu]$ with high probability. The classic method is to take the sample average, cross your fingers, and hope that the Central Limit Theorem is accurate for your distribution. A more rigorous method came from Dyer and Frieze in 1991. They noted that if you take a sample average $Y = (X_1 + X_2 + X_3 + X_4)/4$, then by Chebyshev's Inequality the probability that $Y$ is outside $[(1-\epsilon)\mu,(1 + \epsilon)\mu]$ is at most 1/4. So if you draw $Y_1,Y_2,...,Y_k$ (where $k$ is odd) with the same distribution as $Y$, then using an analysis of the binomial distribution (finally, some combinatorics!), the median of the $Y$ is in the target interval with probability that decreases exponentially in $k$. In this talk, I'll explore this method and show how an integral can be used to effectively bound probabilities for the binomial distribution. In addition, I'll show how to create (with one simple trick!) a random variable $Y'$ that also lies in the target interval with probability at least 3/4ths, but which requires only one draw from the $X_i$ stream. This effectively speeds up the method by a factor of 4.

Where:

Mudd Science Library 126, Pomona College