As a statistical package with a large support community R serves its purpose well. However, at times, especially when doing more computation-heavy analysis, R slows down and reaches memory limits quickly. So, I’ve looked around for a language that is reasonably fast, and quick to develop. Conciseness usually means less room for coding errors and higher productivity. I tried out C, C++, Scala, Python, and Julia. (I also attempted Go, but soon realized the linear algebra libraries were a pain to grind through.) And since my computational work is usually Bayesian, I created my own criteria for judging performance – a standard Bayesian multiple linear regression algorithm.
The simulated data used for this study can be found at my github.
This simulation study is not meant to be an official benchmark. Sites such as computer language benchmark games do a more thorough job. Though, this study surely offers insights for those that do computational statistics.
Note that for C++, I used the Armadillo library with OpenBLAS. For C, I used the GSL library. I could have done the C implementation in OpenBLAS. But is it really worth it? For Scala, I used the breeze library. For Python, I used Numpy. For Julia, I used the Distributions package (which is pretty standard in Julia and extremely well made, in my opinion). Last of all, I didn’t need any additional libraries in R.
Laptop Specs
Here are the specs for the machine I used to run the simulation.
HP EliteBook 8560p | |
---|---|
CPU | Quad core 2 threads per core Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz |
Memory | 16 GB |
Results
From the plots below, we can see that Julia and Scala seem to be fast and concise. In Julia, loading libraries still takes some time to compile (as of v0.4.5, it’s much faster), but there are also ways to precompile packages.
I’ve included code at the bottom of this page.
Bayesian Multiple Linear Regression Speed (seconds)
Bayesian Multiple Linear Regression (Lines of Code)
Speed vs. Code Length Trade-off
It’s definitely a toss up between Julia and Scala. Julia is created for
technical computing. And some have said that it is also general purpose. It
hasn’t reached a 1.0 version yet. Sometimes, I’ll run into bugs when loading
new libraries. Searching for fixes is usually a longer process. Documentation
is not consistent, but all the Julia packages are hosted on Github. I thought
that Julia didn’t support tail-call optimization. But, there is the Lazy.jl
package, which has an implementation. You basically use the macro @bounce
before the function definition. There’s an example
here. Also, you won’t be able to play
around with your own data structures / classes very much. Things like
inheritance are only supported for abstract types. (Basically only a few
built-in types.) In short, Julia is lightning fast, and quick to develop in,
but there is a moderate lack of consistency in documentation and
interoperability of packages. Still, it’s pretty fast for linear algebra
computations. Also it’s quite mobile because it runs the LLVM virtual machine.
But it I think it fails as a general purpose language.
Scala is much more mature of a language and has attracted many large enterprises and users. It is general purpose. It has full support for functional programming and tail call optimization. It is quite fun to program in a functional way and think recursively. And Scala is pretty fast. It runs on JVM which is extremely portable. People even write Android apps in Scala. Many users favor general purpose languages for computation because they want to integrate their algorithms into other products. In other words, it’s often not enough to only have a fast computation platform; having support for functionality outside of computation is desirable for much commercial work. Perhaps also true for academia. Last of all, the distributed computing tool, Spark, is written in Scala. That is quite enticing. Scala is fun. But I wish that the linear algebra libraries would be better supported. There is breeze, which is good, but things like setting the number of threads in openblas are not supported. That’s bizzare.
Scala and Julia are both in demand and desirable skill-sets. So if you’re considering learning one or the other, there are great advantages to learning either. If you have the patience, why not even learn both?
There are many resources for learning Scala. Coursera has two courses centered on Scala. One on Functional Programming, and another on Reactive Programming. Both are taught by Martin Odersky, the creator of Scala. You can find tutorials here and there for Julia. Perhaps the best place to get started is the Julia Homepage. Note that the Jupyter Project also includes Julia, so there are popular implementations of ipython notebook for Julia. (Quick plug-in for Jupyter. You can install vim-bindings!)
Julia vs. Scala