As a statistician I often feel conflicted about software. On the one hand, R is great because there are so many packages available. On the other hand, R is not a good environment for methodological development, particulary for those interested in high-dimensional statistics and big data. This is mainly because R is slow, particularly in loops, which are something that Bayesian statisticians using MCMC for computation cannot avoid. Moreover, it tends to hog memory. As a result, most Bayesians I know do one of two things:
- Use R essentially as a wrapper for compiled C or Fortran code.
- Do computation for novel methods in matlab.
The former has the obvious issue of being more time consuming than R programming, often by a lot. Personally, I don't like debugging seg faults, and with the advent of parallel computing, sparse data structures, et cetera, efficient C programming is getting harder. Matlab is often as fast or faster than C because it natively parallelizes things like matrix multiplication, makes use of a JIT to deal with the looping problem in interpreted languages, and uses optimized linear algebra libraries. I know a number of statisticians who insist that matlab could not possibly be as fast as C, but this evidences a misunderstanding of how these langages actually work. Good matlab code in a parallel environment will often do as well or a bit better than someone's home-baked C code. Unfortunately, matlab is terribly expensive, and it sucks for anything besides matrix programming. The basic appraoch is to get you hooked on Matlab while you're in grad school. But commercial licenses cost about 3500 bucks apiece (2000 for Matlab, which is worthless on its own and another 1500 for statistics toolbox, which contains all the random number generators and some mostly crappy functions to fit standard models). I don't care to think about how much Duke spends on the academic site license. It must be several million dollars a year.
During one of my recent rants on the topic, a buddy of mine suggested I try Julia. If you've never heard of Julia you should go to the website and look at the performance benchmarks. It's impressively fast, and it isn't slow in loops the way that most interpreted languages are. It's free and open source, and the syntax is very similar to matlab. The startup costs in terms of writing code were pretty minimal for me, and I've found it performs as well or better than matlab for large scale Bayesian computation. What was relatively difficult was getting both my linux box and my mac laptop set up with fully functioning Julia and a modified version of ESS for writing and debugging code in Emacs. Since I'm becoming somewhat of an apostle for Julia, I'll describe the process of getting it to work. Not all statisticians are hackers, and as a breed we often hesitate to dedicate much time to mucking around with linux. Hopefully the information here will ease the process enough that more folks will give it a try.
I have a desktop system with two quad-core intel i7's and 22 GB memory. I'm running Ubuntu 12.04 LTS. On Linux, the process of installing is relatively simple. Unless you run 13.04 Ubuntu you'll have to build from source. I highly recommend this anyway. Julia is under heavy development and if you want it to work well you basically need to pull the latest git repository every day. Thus you should clone the git repository and install it that way. The instructions here explain how to clone and build from source. Of course you might not have git, so you'll need to install that first. If you're running ubuntu you can do this with
sudo apt-get install git. Then just follow the instructions from the Julia git repository. Don't forget to read the part near the bottom instructing you to make sure gcc is up to date.
You'll now have a command-line version of Julia installed, but it won't work with the usual version of ESS, including the one you get with apt-get. The easiest thing to do is to purge your current version of ESS (
sudo apt-get --purge remove ess) and then clone the Julia ESS git repository (see instructions here).
You'll need to update Julia and Julia-ESS frequently. To update a cloned git repository, cd into the folder where you installed it and type
git pull. After it finishes the pull, you'll need to
make again using the same flags that you used when you built Julia/ESS from source.
Now you'll undoubtedly want to do more than julia base will allow you to do. The distributions package is now included in base. But for graphics you'll need to choose one of the packages. See here for some options. I'm a ggplot user in
R, so I like Gadfly for Julia, which has a very ggplot-like feel, both in appearance and syntax. Gadfly installed without much of a problem for me on linux. A bizarre quirk of the package installation utilities is that if a package fails to install for whatever reason, when you try to reinstall it will throw an error. In this case, you pretty much have to wipe out ~/.julia (
rm -rf ~/.julia) and then reinstall packages. However, once you have the packages installed, you should be able to keep them current using
Pkg.update() in Julia.
Installing Julia on the Mac was much harder. You might wonder why anyone would do this when a mac binary exists. The mac binary never worked for me -- it kept dying when I tried to multiply matrices. I think the mac binary is probably not updated frequently, and as I said before, daily git pulls are fairly essential to keep everything running smoothly. If you don't clone the repository and build from source, you can't do that. So I recommend struggling through the steps outlined here. FYI, I have a 2010-vintage MacBook Pro with 8 GB RAM and a dual-core Intel core i7. I'm running Mountain Lion. If your system is different, some of the steps may be different, but this should provide a rough guide.
- Install Xcode. You can do this from the apple app store. Just search for Xcode and install. Make sure that you choose to install the optional command line tools during the installation process - this is NOT the default option so read the screens carefully.
- Install homebrew package manager. This is optional but highly recommended. You can get a homebrew by following the instuctions here. After you install, it will recommend you run brew doctor. Follow the advice. Then do what the brew doctor tells you to do, unless you are enough of a linux hacker to know better.
- Go to the Julia source repo and read the specific instructions for Mac OS. Do not clone the git repository yet (you may not have git and you will install it next). The Julia instructions will tell you that you need a 64-bit fortran compiler. If you have homebrew, then you should be good to go, just install/update gfortran using home brew (
brew install gfortran).
- Install git. Just follow the instructions here but if you choose to use the installer rather than the terminal, make sure you select to install the optional command line tools for git.
- Now clone the Julia git repo and make from source (see instructions).
- You should now be able to run Julia from the command line. Make sure it's working.
- Now get the Julia fork of ESS. Again clone the git repo here and install from source, following the instructions on the website. You may get an error when you run make, however ESS seems to be working fine for me despite this make error, so apparently the make error is not fatal.
- Modify your
.emacsfile according to the instructions provided at the Julia-ESS repo site. If you don't have a .emacs file, you'll need to create one in your home folder (
- Open emacs and verify that you can start julia from ESS (
M-x julia). If that's working, you'll now want to install some packages.
Pkg.update()to get the latest.
- As noted above, I like Gadfly because I'm used to ggplot. Getting gadfly to work was not that easy. First, install XQuartz from sourceforge. Then at the julia prompt, type
Pkg.add("Gadfly"). It will eventually prompt you about how you want to install. Choose installation with homebrew. It should work. Run some of the examples in the tutorial to check.
OK, you should now have a solid Julia installation. Happy computing!