Getting Started with LLVM on OS X

A few weeks ago, I decided that one of the things I wanted to tackle during my time at Hacker School was getting familiar with the LLVM project. To that end, myself and several other Hacker Schoolers formed an informal group to work through the official LLVM Kaleidoscope tutorial. We made reasonable progress at first, but as soon as we actually had to start dealing with the LLVM tools, I started encountering problems.

Long story short, I ended up getting frustrated with the state of the documentation surrounding LLVM and moving on to working on other less upsetting projects. This last weekend though I ended up getting back into it. I tried two different approaches.

First, I decided to use a Vagrant supported VM to do my LLVM setups. This was for two reasons: the fact that I do my development on a Mac running OS X seems to be problematic when trying to install LLVM in a global manner. This is because some of the LLVM tools (like Clang) make up the default build environment on OS X. But the toolset is insufficient if you actually want to build languages with LLVM, and the presence of these libraries makes it... complicated to try and install a more complete version. As Homebrew says when you try to install LLVM via brew install llvm:

Mac OS X already provides this software and installing another version in parallel can cause all kinds of trouble.

This is essentially the problem that I ran into a couple weeks ago that caused me to give up on working with LLVM. This time however, I had the insight that I wasn't solely limited to the physical machine that I had an the one operating system I have the space to install on it. By using Vagrant I could pretty trivially have a working Ubuntu environment to use as my development platform for working with LLVM.

So that's what I did. The most interesting thing about that repository is how I ended up provisioning my Vagrant VM. To quote briefly from the Wikipedia Article:

... provisioning is a set of actions to prepare a server with appropriate systems, data and software, and make it ready...

The provisioning is done via shell scripts (that's not the interesting part!), but instead of using apt-get to install all the software that I required I mostly had to build them from scratch. But let's go through the story end to end.

First off, I did try to install all the software via apt-get. There was some confusion for me about which version of LLVM to install. From the reading I'd done on the [LLVM site] previously I thought that since 3.5 has been officially released it would be considered the stable version. However, when you install LLVM 3.5 via apt (with sudo apt-get install llvm-3.5) the binaries don't seem to end up getting installed on a path location. Or rather, they are on path, but the names are all suffixed with -3.5.

This wasn't really what I wanted, so I also tried installing the 3.4 packages which it turns out are also the default set of packages installed if you apt-get install llvm. This got the right names for the tools onto my path, so step one check. So much easier to install than with Homebrew! Well, to install and make sure that it was available to me anyhow.

Then, since I was going to be doing C++ development, I wanted to use CMake as my build generator. If you've never used CMake but you do C/C++ development I'd highly recommend checking it out. It allows you to specify your build process at a very high level. As a bonus, it can then generate the files to process that build with several different kinds of actual build systems. It's way more convenient than writing your own Makefiles and much more modern than the whole autoconf system (which I can't say much about, I've never learned it).

Anyhow, CMake has another handy benefit. It has a system for finding your dependencies. Not like over the internet like some build systems, but still it's better than the basic situation in C. Since CMake has actually been around for a while there are many standard open source projects that provide the mechanics for finding their libraries with CMake, including - handily enough - the LLVM project.

One requirement of this system is that a "module file" be somewhere on the CMake modules-path. For packages that support it, this should happen when the package is installed and the locations of the critical library and header files are actually known. Only problem is that the CMake packages available in the Ubuntu repositories don't actually do this. Turns out there was a bug report early in 2014 describing the issue. Then a long back and forth with the package maintainer (I think) and many cases of "Hey, it should be fixed now!", followed by a response of "No, it actually still doesn't work..."

Then I found the LLVM apt nightly builds page which possibly answers why LLVM 3.4 is still the default package on Ubuntu, since the LLVM project considers 3.4 to be "stable" and 3.5 is the "qualification branch." This apt repository seemed like a good bet for finding an LLVM package that would properly install the necessary CMake files, but alas, I had no such luck.

So instead I built LLVM from scratch. This is both easier and harder than it sounds. Building LLVM from source is a very automated process, and the parts that aren't automatic by default (like downloading, checking file signatures and unpacking archives) are highly automatable. The painful part is that VM's are always slower than a natively installed OS. And my poor Macbook Air takes about 45 minutes to do a full LLVM build under the native OS X. So basically once I got everything setup I let the LLVM build run and stopped thinking about it except to check on it every few hours.

I had quite a few build failures mostly related to what seem like out of memory errors in GCC. Sometimes just restarting the compilation helped, but I also restarted Vagrant VM and gave it a full 2GB of memory which helped a lot.

Once the install finished though CMake was able to find it immediately. As pointed out earlier, the code is up on Github. I was surprised to discover that there is no Vagrant VM that exists for doing LLVM development, so that's going to go on my list of projects. Ideally, that won't involve building LLVM from source in the provisioning stage, because of the computational issues with VM's. It might be an interesting excuse to learn how to build .deb packages and make a PPA with a CMake that properly installs the FindLLVM.cmake files...

1249 Words