Over summer 2010, I interned at Microsoft Research in the eXtreme Computing Group. I worked with Roger Barga and Wei Lu to build a scalable clustering algorithm on Windows Azure as the first step toward a toolkit of data analysis and machine learning algorithms for the cloud.
We published a paper that appeared at DataCloud 2011 (video, slides).
At the end of my internship, I presented a preliminary version of this work at MSR on August 12, 2010 (video, slides).
The CloudClustering source code is available on GitHub.