Towards a mathematical understanding of geometric clustering problems

Start: 04/29/2015 - 4:15pm
End  : 04/29/2015 - 5:15pm


Rachel Ward, University of Texas, Austin


It is often said that  "clustering is difficult only when it does not matter".  We aim to make this statement more mathematically precise in the setting of clustering points in Euclidean space.  We will focus on the k-means optimization problem, arguably the most popular algorithm for unsupervised clustering.  While the k-means objective is highly nonconvex and hard to optimize in the worst case, heuristic algorithms are known to empirically produce good clusters "when it matters".   To explain these observations, we introduce geometric conditions on a set of points under which the k-means objective can be optimized efficiently.  For points drawn from separated balls, the important quantities are the distances between the centers of the balls compared to the relative density of points within the balls.  We will also discuss the case of outliers and overlapping point clouds, relationships and implications for other clustering algorithms such as spectral clustering, and conclude by discussing open questions related to this work.  This is joint work with P. Awasthi, A. Bandeira, M. Charikar, R. Krishnaswamy, and S. Villar.

Shanahan Center for Teaching and Learning (SCTL), at Harvey Mudd, Basement, B460

Ward.pdf106.85 KB