Academic publishing

Reproducible research, literate programming, IPython, and GitHub

I came across this thread on Hacker News, which links to a curated gallery of IPython notebooks, including countless interesting topics, most notably, reproducible academic publications. I am a fan of IPython, combined with a few other tools, it makes a great replacement for Mathematica, but I never thought of it as a way to make research results more accessible. This is because I was using it from Spyder, avoiding its notebook interface. Hosting notebooks on GitHub and displaying them with nbviewer provide a simple mechanism to produce reproducible research. Even more so now, as GitHub is making changes to appeal more to scientists.

When I still had a Mathematica license, I was fond of its notebook mechanism. I loved mixing code, notes, diagrams, and mathematical formulas. It was an example of literate programming, a form of coding where natural language and source code blend, making it a lot easier to digest. Wolfram advocates the use of the notebook interface for publishing results by providing a repository for notebooks, and also providing a free player for reading them if you do not have a copy of Mathematica. I do not think the free player caught on, and, of course, you cannot edit the notebooks if you want to tinker with the published results. A combination of GitHub and Ipython sounds like a much more viable option.

Enter IPython

Unfortunately, the console and qtconsole environments of IPython will not allow to mix code with anything. The only option is the browser-based notebook, which runs a lightweight server called Tornado. Sage made me dislike browser-based computer algebra systems. I find the solution clunky.

To reduce clunkiness, I set up a Firefox profile dedicated to running IPython notebooks. It is an empty profile without plugins, and I set the home page to http://127.0.0.1:8888/, which is where the IPython notebook server runs. I disabled the address bar following these instructions:

mkdir ${FIREFOX_IPYTHON_PROFILE}/chrome
echo '#nav-bar { display: none !important; }' > ${FIREFOX_IPYTHON_PROFILE}/chrome/userChrome.css

IPython also allows creating profiles, so I set up one exclusively for the notebook interface:

ipython profile create browser

To achieve the same functionality as in Spyder, I put the following to ~/.ipython/profile_browser/startup/00-first.ipy:

from sympy import init_printing
init_printing()
x, y, z, t = symbols('x y z t')
%pylab inline

I use the following script to start the IPython server and the Firefox profile. The script also shuts down the IPython server if I close the Firefox instance:

#!/bin/bash
ipython2 notebook --no-browser --profile browser &
sleep 1
firefox -P ipython -no-remote
pid=`ls ${HOME}/.ipython/profile_browser/security/nbserver-*|\
  sed -e 's/.*nbserver-//'|sed -e 's/\..*//'`
kill $pid

The result looks like the following:

ipython-notebook

Parallel to Spyder, I use the notebook interface for writing notes and record meaningful lines of code. It would be wonderful to develop code in Spyder, and taking notes in the notebook interface, while sharing the same kernel. Unfortunately, we cannot launch new IPython kernels in the notebook server from outside the browser. Theoretically Spyder can connect to an existing kernel, but I could only connect to console and qtconsole kernels. So at this point, you need to re-run calculations in the notebook interface.

Sharing the work

Putting every bit of code or text under version control is a good habit. It is only natural to put the notebooks in git repositories. From here, it is an effort of two clicks to add a new repository on GitHub for the notebooks, and push them online.

The project nbviewer renders static HTML pages of online notebooks, making it spectacularly easy to share them. This solution also allows downloading the notebook, and it has a direct link to the GitHub repo where the notebook is developed. Reproducible research does not get easier than this. I tried it on a small problem just to see how it works. IPython digest Markdown in the text sections, and also renders LaTeX equations through MathJax. I am pleased with the result.

A few arXiv papers link to notebooks through nbviewer. The comment section allows adding URLs, like in this paper. While not the most transparent method, it is a lot better than leaving the reader to re-implement everything to reproduce the results. We are facing a crisis with reproducible research, and this approach is a step forward.