I've been playing around with Apache Spark for a while now, mainly using Scala. A couple of my colleagues are interested in learning about Spark as well, but they're data scientists, not developers, and they are more comfortable using Python. So far so good - Spark has a nifty Python shell and a comprehensive Python API, after all.
But what my colleagues really like is nice interactive tools without too much command-line voodoo. And they're already using IPython Notebook to give them all this goodness for their Python data science work.
Happily, it turns out you can use IPython Notebook with a local Apache Spark installation, so I wrote a quick demo notebook to illustrate how you can use Spark to do the traditional word count inside a notebook.
The repo is on BitBucket so you can do a git clone or download the code as a zip-file. It's just a single notebook, a data folder containing a couple of text files, and a sample shell-script for starting the notebook on Linux.
So if you're new to Spark with IPython Notebook, feel free to try it out.
No comments:
Post a Comment