This post is a part of the GroupLens Iron Blogging effort, so take that for what you will.
Michael Ekstrand
recently wrote
up his reasons for sticking to using R for data analysis, which
included computational efficiency, ggplot2
, and comfort
with the workflow.
I don’t have any problem with his argument, mostly, people should use what they want to use. You may recall, I recently discovered Python/Pandas/GeoPandas/Jupyter, and the benefits that it has for geographic analysis. Michael too has discovered the power of Jupyter, saying:
Even though we’ll keep using R, there’s one huge benefit that we get from the Python data ecosystem: I’ve mostly switched from RStudio to using Jupyter notebooks with IRKernel. It is fantastic. We’ve also been using Anaconda to install R and it’s worked pretty well.
This is the thing that I think is interesting: he’s decided to stay in R-land, while taking advantage of Jupyter. Don’t get me wrong, more power to him, if that’s what works, it’s what he should use.
Personally, I’ve been doing something a little bit different, mostly
(although not entirely) as a crutch so I don’t have to jump in the
Python deep end and re-learn the things I know how to do in R; I’m using
the Rpy2 magic
in Jupyter, which lets me switch
environments whenever I want.
For instance, I can connect to a Postgres+PostGIS database with
GeoPandas, and load up a GeoPandas DataFrame. I can then use the
%Rpush
magic to push this DataFrame into R, plot it with
ggplot2
, and go back to Python for geographic operations or
analysis. I have a notebook now that does all the data manipulation in
Python, but I knew the Wilcox.test
command in R, so I just
ran that test with the R cell magic.
Stepping back a bit, it seems that a significant amount of the power, at least for me, that comes from Jupyter Notebook, and it’s ability use things like IRKernel (and Rpy2), is flexibility. I’m using the tools that are best (or easiest, which is best when you’re time-constrained) for the job. Michael should do what he needs to do, sounds like R is the best tool for him.
I just also think that it needn’t be either-or.