GraphX is a distributed graph computation library built on top of Apache Spark. It aims to be as fast as the fastest specialized graph systems while providing much more flexibility. GraphX comes included with Spark; check out the programming guide and the OSDI 2014 paper .
As an undergrad I wrote a Pregel-like graph processing framework for Spark called Bagel. Bagel is now superseded by GraphX.
Inspired by an article about syntax highlighting for variables instead of keywords, I wrote a demo implementation for Emacs. It became surprisingly popular, reaching the 77th percentile for downloads on MELPA, the primary Emacs package archive. It automatically picks optimally distinct colors and attempts to detect identifiers accurately across a variety of languages.
As an undergrad I wrote a replay debugger for Spark programs called Arthur. Arthur enabled some interesting program analysis techniques, including forward and backward record tracing: if a distributed computation yielded a strange output record (one that was unexpectedly null, for example), Arthur could trace the record back through the computation graph to find which input records it came from and how it came to be.
We wrote a technical report on Arthur .
I interned at Microsoft Research's eXtreme Computing Group the summer after I graduated high school. My project was to explore how to design scalable iterative programs on top of certain cloud storage abstractions, and in the process I built a prototype called CloudClustering. This led to a workshop paper  at DataCloud 2011.
In 10th grade I was an occasional Boggle player, and I became curious what the densest Boggle board (the one the most words packed into it) would look like. I wrote a package called DistBoggle that included a fast Java Boggle solver and two parallel optimizers: a hill climbing algorithm and a coarse-grained distributed genetic algorithm. I later wrote my IB Extended Essay  about this.