CloudClustering: Toward an iterative data processing pattern on the cloud

Over summer 2010, I interned at Microsoft Research in the eXtreme Computing Group. I worked with Roger Barga and Wei Lu to build a scalable clustering algorithm on Windows Azure as the first step toward a toolkit of data analysis and machine learning algorithms for the cloud.

We published a paper that appeared at DataCloud 2011 (video, slides).

At the end of my internship, I presented a preliminary version of this work at MSR on August 12, 2010 (video, slides).

The CloudClustering source code is available on GitHub.