Comparing Programming Languages for Statistics

As a statistical package with a large support community R serves its purpose well. However, at times, especially when doing more computation-heavy analysis, R slows down and reaches memory limits quickly. So, I’ve looked around for a language that is reasonably fast, and quick to develop. Conciseness usually means less room for coding errors and higher productivity. I tried out C, C++, Scala, Python, and Julia. (I also attempted Go, but soon realized the linear algebra libraries were a pain to grind through.) And since my computational work is usually Bayesian, I created my own criteria for judging performance – a standard Bayesian multiple linear regression algorithm.

\[\begin{array}{rcl} y\v\beta & \sim & \text{Normal}(X\beta,\sigma^2I) \\ \beta & \sim & \text{Normal}(0,100(X'X)^{-1}) \\ \sigma^2 & \sim & \text{Gamma}(1,1) \end{array}\]

The simulated data used for this study can be found at my github.

This simulation study is not meant to be an official benchmark. Sites such as computer language benchmark games do a more thorough job. Though, this study surely offers insights for those that do computational statistics.

Note that for C++, I used the Armadillo library with OpenBLAS. For C, I used the GSL library. I could have done the C implementation in OpenBLAS. But is it really worth it? For Scala, I used the breeze library. For Python, I used Numpy. For Julia, I used the Distributions package (which is pretty standard in Julia and extremely well made, in my opinion). Last of all, I didn’t need any additional libraries in R.

Laptop Specs

Here are the specs for the machine I used to run the simulation.

	HP EliteBook 8560p
CPU	Quad core 2 threads per core Intel(R) Core(TM) i7-2760QM CPU @ 2.40GHz
Memory	16 GB

Results

From the plots below, we can see that Julia and Scala seem to be fast and concise. In Julia, loading libraries still takes some time to compile (as of v0.4.5, it’s much faster), but there are also ways to precompile packages.

I’ve included code at the bottom of this page.

Bayesian Multiple Linear Regression Speed (seconds)

C++ wins in execution time.

Bayesian Multiple Linear Regression (Lines of Code)

R wins in conciseness.

Speed vs. Code Length Trade-off

We want things to be in the bottom left quadrant — Julia wins that one.

It’s definitely a toss up between Julia and Scala. Julia is created for technical computing. And some have said that it is also general purpose. It hasn’t reached a 1.0 version yet. Sometimes, I’ll run into bugs when loading new libraries. Searching for fixes is usually a longer process. Documentation is not consistent, but all the Julia packages are hosted on Github. I thought that Julia didn’t support tail-call optimization. But, there is the Lazy.jl package, which has an implementation. You basically use the macro @bounce before the function definition. There’s an example here. Also, you won’t be able to play around with your own data structures / classes very much. Things like inheritance are only supported for abstract types. (Basically only a few built-in types.) In short, Julia is lightning fast, and quick to develop in, but there is a moderate lack of consistency in documentation and interoperability of packages. Still, it’s pretty fast for linear algebra computations. Also it’s quite mobile because it runs the LLVM virtual machine. But it I think it fails as a general purpose language.

Scala is much more mature of a language and has attracted many large enterprises and users. It is general purpose. It has full support for functional programming and tail call optimization. It is quite fun to program in a functional way and think recursively. And Scala is pretty fast. It runs on JVM which is extremely portable. People even write Android apps in Scala. Many users favor general purpose languages for computation because they want to integrate their algorithms into other products. In other words, it’s often not enough to only have a fast computation platform; having support for functionality outside of computation is desirable for much commercial work. Perhaps also true for academia. Last of all, the distributed computing tool, Spark, is written in Scala. That is quite enticing. Scala is fun. But I wish that the linear algebra libraries would be better supported. There is breeze, which is good, but things like setting the number of threads in openblas are not supported. That’s bizzare.

Scala and Julia are both in demand and desirable skill-sets. So if you’re considering learning one or the other, there are great advantages to learning either. If you have the patience, why not even learn both?

There are many resources for learning Scala. Coursera has two courses centered on Scala. One on Functional Programming, and another on Reactive Programming. Both are taught by Martin Odersky, the creator of Scala. You can find tutorials here and there for Julia. Perhaps the best place to get started is the Julia Homepage. Note that the Jupyter Project also includes Julia, so there are popular implementations of ipython notebook for Julia. (Quick plug-in for Jupyter. You can install vim-bindings!)

Julia vs. Scala

The same experiment was performed just for Julia and Scala. The number of columns in the X matrix were varied and the execution times were recorded. Since the linear algebra libraries used in both cases were the same (OpenBLAS), the performances were the same.

Sample Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
dat <- as.matrix(read.table("../data/dat.txt"))

y <- dat[,1]
X <- dat[,-1]
k <- ncol(X)
n <- length(y)
colnames(X) <- paste0("b",0:9)
Xt <- t(X)
XXi <- solve(Xt%*%X)
s <- 100
XXi%*%Xt%*%y
a <- 1
b <- 1
B <- 1e5
s2 <- 10

det <- function(x,log=F) as.numeric(determinant(x,log=log))[1]
ll <- function(be,sig2) t(y-X%*%be)%*%(y-X%*%be)/(-2*sig2) - n/2*log(sig2)
lpb <- function(be) -t(be)%*%XXi%*%be/(s2*2)
lps <- function(sig2) (a-1)*log(sig2)-sig2/b
mvrnorm <- function(M,S,n=nrow(S)) M + t(chol(S)) %*% rnorm(n)

csb <- 4*XXi
css <- 1
acc.b <- 0
acc.s <- 0

b.hat <- matrix(0,B,k)
s2.hat <- rep(1,B)

r.time <- system.time(
for (i in 2:B) {
  b.hat[i,] <- b.hat[i-1,]
  s2.hat[i] <- s2.hat[i-1]

  # Update Beta
  cand <- mvrnorm(b.hat[i,],csb)
  q <- ll(cand,s2.hat[i])+lpb(cand)-ll(b.hat[i,],s2.hat[i])-lpb(b.hat[i,])
  if (q>log(runif(1))) {
    b.hat[i,] <- cand
    acc.b <- acc.b+1
  }

  # Update s2
  cand <- rnorm(1,s2.hat[i],css)
  if (cand>0) {
    q <- ll(b.hat[i,],cand)+lps(cand)-ll(b.hat[i,],s2.hat[i])-lps(s2.hat[i])
    if (q>log(runif(1))) {
      s2.hat[i] <- cand
      acc.s <- acc.s+1
    }
  }

  cat(paste0("\r",round(100*i/B),"%"))
})

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
import numpy as np
from numpy.linalg import *
import time # for timing program
import sys # for printing to std out

dat = np.genfromtxt('../data/dat.txt',delimiter=' ')

y = dat[:,0]
X = dat[:,1:11]
K = X.shape[1]
n = y.shape[0]
Xt = X.transpose()
XXi = inv(np.dot(Xt,X))
b_mle = np.dot( np.dot(XXi, Xt), y )

a = 1
b = 1
B = 100000
s2 = 10

def ll(be, sig2):
    y_min_Xb = y - np.dot(X, be)
    return np.dot( y_min_Xb.transpose(), y_min_Xb) / (-2*sig2) - n/2 * np.log(sig2)

def lpb(be):
    return -np.dot( be.transpose(), np.dot(XXi, be) ) / (s2*2)

def lps(sig2):
    return (a-1) * np.log(sig2) - sig2 / b

def mvrnorm(M,S): # STOPPED HERE
    return M + np.dot( cholesky(S) , np.random.normal(0,1,M.shape[0]) )

#ll(b_mle, s2) # test
#lpb(b_mle) # test
#lps(s2) # test
mvrnorm(b_mle, np.eye(b_mle.shape[0]))

csb = 4 * XXi
css = 1
acc_b = 0
acc_s = 0

b_hat = np.zeros(shape=(B,K))
s2_hat = np.ones(B)

# Time this:
start = time.time()
for b in range(1,B):
    b_hat[b] = b_hat[b-1]
    s2_hat[b] = s2_hat[b-1]
    #
    # Update Beta:
    cand = mvrnorm(b_hat[b], csb) 
    q = ( ll(cand, s2_hat[b]) + lpb(cand) 
        - ll(b_hat[b], s2_hat[b]) - lpb(b_hat[b]) )
    if q > np.log( np.random.rand() ):
      b_hat[b] = cand
      acc_b += 1
    #
    # Update s2:
    cand = np.random.normal(s2_hat[b], css)
    if cand > 0:
      q = ( ll(b_hat[b], cand) + lps(cand) 
          - ll(b_hat[b], s2_hat[b]) - lps(s2_hat[b]) )
      if q > np.log( np.random.rand() ):
        s2_hat[b] = cand
        acc_s += 1
    # 
    if b%(B//10) == 0:
      print "\rProgress: %d%s" %(round(b*100/B),"%"),
      sys.stdout.flush()

end = time.time()

print "\n"
print "beta: ", b_hat[B*.9:B].mean(0)
print "s2:   ", s2_hat[B*.9:B].mean()

print "acc_b:", acc_b / B
print "acc_s:", acc_s / B

elapsed_time = end - start # 23 seconds
print "Time Elapsed: ", elapsed_time, "\n"

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
blas_set_num_threads(1) # because small matrices
using DataFrames, Distributions#, Gadfly

const dat = readdlm("../data/dat.txt")
n,k1 = size(dat)
const k = k1-1

const y = dat[:,1]
const X = dat[:,2:end]

const XXi = inv(X'X)
const s = 100
const mle = XXi*X'y

a = 1
b = 1
s2 = 10
csb = 4XXi
S = chol(csb)'
css = 1

function ll(be::Array{Float64,1},sig2::Float64) 
  c = y-X*be
  c'c/(-2sig2)-n*log(sig2)/2 
end

lpb(be::Array{Float64,1}) = -be'XXi*be/2s2
lps(sig2::Float64) = (a-1)*log(s2)-s2/b 
mvrnorm(M::Array{Float64,1}) = M+S*randn(k)

function mh(B=100000)
  accb = 0; accs = 0
  bb = Array(Float64,(k,B))
  ss = Array(Float64,B)
  bcur = bb[:,1]
  scur = 1.0

  println("Starting Metropolis...")
  @time for i in 1:B

    # Update β̂: 
    candb = mvrnorm(bcur)
    q = ll(candb,scur)+lpb(candb) -ll(bcur,scur)-lpb(bcur)
    if q[1]>log(rand())
      bcur = candb
      accb += 1
    end

    # Update ŝ²:
    cands = rand(Normal(scur,sqrt(css)))
    if cands>0
      q = ll(bcur,cands)+lps(cands) -ll(bcur,scur)-lps(scur)
      if q[1]>log(rand())
        scur = cands
        accs += 1
      end
    end

    @inbounds bb[:,i] = bcur
    @inbounds ss[i]  = scur

    if i%(B/10)==0 print("\r",round(100*i/B),"%") end
  end # End of loops
  println("End of Metropolis.\n\n")

  return (bb,ss,accb,accs,B)
end # End of mh function

bb,ss,accb,accs,B = @time mh();

println("β̂: \n", mean(bb[:,Int(B*.9):end],2),"\n")
println("ŝ²: ",  mean(ss[Int(B*.9):end]),"\n")

println("Acceptance rate for β̂: ",accb/B)
println("Acceptance rate for ŝ²:",accs/B)

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
package functionalBayesMLR

object bayesMLR {
  import scala.io.Source
  import java.io.File // write to file. new File("file.txt")
  import breeze.linalg._
  import breeze.linalg.{DenseMatrix=>dmat,DenseVector=>dvec}
  import breeze.numerics._
  import breeze.stats.distributions._ // 1 draw: Dist.draw. n draws: Dist.sample(n)
  import breeze.stats.mean

  class State(val beta: dvec[Double], val s2: Double)
  val afile = new File("../data/dat.txt")
  val dat = csvread(afile,' ') //requires breeze.linalg
  val G = Gaussian(0,1)
  val U = Uniform(0,1)

  val (n,k) = (dat.rows, dat.cols-1)

  val y = dat(::,0)
  val X = dat(::,1 to k)
  val Xt = X.t
  val XXi = inv(Xt*X)
  val s2 = 10.0
  val a = 1.0
  val b = 1.0
  val B = 100000
  val keep = 10000
  val burn = B - keep
  val csb = XXi(::,*) * 4.0
  val S = cholesky(csb)
  val css = 1.0
  var accb, accs = 0

  def ll(be: dvec[Double], sig2: Double): Double = {
    val c = y - X * be
    (c.t*c/sig2 + n*log(sig2)) / -2.0
  }

  def lpb(be: dvec[Double]): Double = be.t*XXi*be / (-2.0*s2)
  def lps(sig2: Double): Double = (a-1)*log(sig2) - sig2/b
  def mvrnorm(m: dvec[Double]) = m + S*G.samplesVector(k)

  // Update Functions:
  def update_State(s: State): State = {

    // Update beta
    val b_cand = mvrnorm(s.beta)
    val q = ll(b_cand,s.s2)+lpb(b_cand) - ll(s.beta,s.s2)-lpb(s.beta)
    val beta_new = if (q > log(U.draw)) { accb += 1; b_cand} else s.beta

    // Update s2
    val s2_cand= G.draw * sqrt(css) + s.s2
    val s2_new = if (s2_cand> 0) {
      val q = ll(beta_new, s2_cand)+lps(s2_cand) - ll(beta_new, s.s2)-lps(s.s2)
      if (q > log(U.draw)) {accs += 1; s2_cand} else s.s2
    } else s.s2

    new State( beta_new, s2_new )
  }

  def mh(i: Int, S: List[State]): List[State] = {
    if (i < B) {
      if (i % (B/10) == 0) print("\rProgress: "+round(i*100.0/B)+"%")
      val new_State = update_State(S.head) :: S
      mh(i+1, new_State)
    } else S.dropRight(burn)
  }

  def main(args: Array[String]) = {
    val out = cool.timer { mh(1, List(new State(dvec.zeros[Double](k), 1.0))) }
    val (beta_post, sig2_post) = out.map(s => (s.beta, s.s2)).unzip

    println("Acceptance beta: " + 100.0 * accb/B+"%")
    println("Acceptance sig2: " + 100.0 * accs/B+"%\n")
    println("Posterior mean sig2: " + sig2_post.reduce(_+_) / keep)
    println("Posterior mean beta:" )
    (beta_post.map(x => x / keep.toDouble).reduce(_+_)).foreach(println)
  }
} //see: http://github.com/luiarthur/progSpeedCompare/tree/master/scala_functional

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
#include <iostream>
#include <armadillo>
#include <ctime>

using namespace std;
using namespace arma;

const double pi  =3.141592653589793238462;
int n,k;  // nrow(X),ncol(X)
mat z,X,y,XXi,Xt;
const double a=1;
const double b=1;
const int B=pow(10,5);
const double s2 = 10;

double ll(mat be, double sig2) {
  mat c(1,k), out(k,1);

  c = y-X*be;
  out = (c.t()*c / sig2 + n*log(sig2))/-2;
  
  return as_scalar(out);
}

double lpb(mat be) {
  return as_scalar(-be.t()*XXi*be/(2*s2));
}

double lps(double sig2) {
  return (a-1) * log(sig2) - sig2/b;
}

mat mvrnorm(mat M, mat S) {
  int n = M.n_rows;
  mat e = randn(n);
  return M + chol(S).t()*e;
}

int main(int argc, char** argv) {
  mat mle;

  //posteriors:
  mat bb; 
  mat ss;

  // candidate sigma:
  mat csb;
  const double css = 1;

  //candidate values:
  mat candb;
  double cands;
  
  //acceptance rates:
  int accb = 0;
  int accs = 0;

  //metropolis ratio:
  double q;

  z.load("../data/dat.txt");
  n = z.n_rows;
  k = z.n_cols-1;

  y = z.col(0);
  X = z.cols(1,k); // columns 2 to k+1 of z
  Xt = X.t();
  XXi = (Xt*X).i();
  csb = 4*XXi;
  mle = XXi * Xt * y;
  
  bb.set_size(B,k);
  ss.set_size(B,1);
  bb.zeros();
  ss.ones();

  //current vals:
  mat bc = bb.row(0);
  double sc = 1.0;

  cout << "Starting Metropolis:" <<endl;
  clock_t t1 = clock();
  for (int i=1; i<B; i++) {
    // Set Initial Values:
    bb.row(i) = bb.row(i-1);
    ss.at(i,0) = sc;
    bc = bb.row(i).t();

    //Update Beta:
    candb = mvrnorm(bc,csb);
    q = ll(candb,sc)+lpb(candb) -ll(bc,sc)-lpb(bc);
    if (q>log(randu())) {
      bc = candb;
      bb.row(i) = bc.t();
      accb++;
    }

    //Update sigma2:
    cands = randn()*sqrt(css)+sc;
    if (cands>0){
      q = ll(bc,cands)+lps(cands) -ll(bc,sc)-lps(sc);
      if (q>log(randu())) {
        sc = cands;
        accs++;
      }
    }
    
    cout << "\r" << i*100/B <<"%";
  }
  clock_t t2 = clock();
  double elapsed = double(t2-t1) / CLOCKS_PER_SEC;
  cout << "Elapsed Time: " <<elapsed<<"s. \n"<<endl;

  cout << "Posterior Means Beta: \n" << 
          mean(bb.rows(90000,100000-1)).t()<<endl;
  cout << "Posterior Mean Sigma2: \n" <<
          mean(ss.rows(90000,100000-1))<<endl;
  cout << "Beta Acceptance:   "<< 100*accb/B <<"%"<<endl;
  cout << "Sigma2 Acceptance: "<< 100*accs/B <<"%"<<endl;
  cout <<endl;
  //ss.save("s2.txt",raw_ascii);

  return 0;
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
#include "my_gsl.h" // print & read matrices / other functions

double lpb(gsl_vector* b, gsl_matrix* XXi, double s2) {
  // b' XXi b / (-2*s2)
  int n = b->size;
  double out = 0;
  double rowSum;

  gsl_vector* XXib = gsl_vector_alloc(n);
  mv_prod(XXi,b,1.0,XXib);
  out = vv_prod(b,XXib);
  gsl_vector_free(XXib);

  out = out / -(2*s2);

  return out;
}

double lps(double s2, double a, double b) {
  return (a-1) * log(s2) - s2/b;
}

double ll(gsl_vector* b, double s2, gsl_vector* y, gsl_matrix* X) {
  // c = y-X*be
  // c'c/(-2sig2)-n*log(sig2)/2 

  int n = y->size;
  double out;
  gsl_vector* c = gsl_vector_alloc(n);
  gsl_vector_memcpy(c,y);
  gsl_blas_dgemv(CblasNoTrans,-1.0,X,b,1.0,c);
  out = vv_prod(c,c) / (-2*s2) - n * log(s2) / 2;
  gsl_vector_free(c);

  return out;
}

int main(int argc, char* argv[]) {
  // chol
  gsl_rng* r = gsl_rng_alloc(gsl_rng_taus);
  char* filename = "../data/dat.txt";
  int n = countFileRows(filename);
  int k1 = countFileCols(filename,' ');
  int k = k1 - 1;

  gsl_matrix* M = gsl_matrix_alloc(n,k1);
  gsl_matrix* X = gsl_matrix_alloc(n,k);
  gsl_vector* y = gsl_vector_alloc(n);
  gsl_matrix* XXi = gsl_matrix_alloc(k,k);
  gsl_matrix* csb = gsl_matrix_alloc(k,k);
  gsl_matrix* cholS = gsl_matrix_alloc(k,k);

  read_csv(filename, M);
  mat_sub(M,0,n-1,1,k,X);
  gsl_matrix_get_col(y,M,0);

  xxi_m(X,XXi);
  gsl_matrix_memcpy(csb,XXi);
  gsl_matrix_scale(csb,4);
  chol(csb,cholS);

  int B = 100000;
  int acc_b = 0;
  int acc_s = 0;
  gsl_matrix* bb = gsl_matrix_alloc(B,k);
  gsl_vector* ss = gsl_vector_alloc(B);
  gsl_matrix_set_zero(bb);
  gsl_vector_set_all(ss,1);
  gsl_vector* candb = gsl_vector_alloc(k);
  gsl_vector* currb = gsl_vector_alloc(k);
  double cands;
  double sc = 1;
  double css = 1;
  double q;
  double a = 1;
  double b = 1;

  for (int b=1; b<B; b++) {

    // update beta
    mvrnorm(currb,cholS,r,candb);
    q = ll(candb,sc,y,X) + lpb(candb,XXi,sc)-
        ll(currb,sc,y,X) - lpb(currb,XXi,sc);

    if (q>log(runif())) {
      gsl_vector_memcpy(currb,candb);
      acc_b += 1;
    }

    // update s2
    cands = sc + gsl_ran_gaussian(r,css);
    if (cands > 0) {
      q = ll(currb,cands,y,X) + lps(cands,a,b) -
          ll(currb,sc,y,X)    - lps(sc,a,b);
      if (q>log(runif())) {
        sc = cands;
        acc_s += 1;
      }
    }

    // set bb, ss
    gsl_matrix_set_row(bb,b,currb);
    gsl_vector_set(ss,b,sc);

    printf("\r%d%s", (b+1)*100/B, "%");
  }


  print_matrix(bb,"out/beta.dat");
  print_vector(ss,"out/s2.dat");
  // Free Memory:
  gsl_matrix_free(cholS);
  gsl_vector_free(candb);
  gsl_vector_free(ss);
  gsl_vector_free(currb);
  gsl_matrix_free(bb);
  gsl_matrix_free(csb);
  gsl_matrix_free(XXi);
  gsl_vector_free(y);
  gsl_matrix_free(X);
  gsl_matrix_free(M);
  gsl_rng_free(r);

  return 0;
}

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
// my_gsl.h is included in bayesMLR.c
// Contains wrapper funtions 
// References: 
// https://www.gnu.org/software/gsl/manual/html_node/Matrices.html
// https://github.com/Blei-Lab/diln/blob/master/gsl_wrapper.c
#include <stdio.h>             // standard input/output
#include <stdlib.h>            // malloc
#include <math.h>              // fabs, sqrt, etc.
#include <time.h>              // time / set seed
#include <gsl/gsl_rng.h>       // GNU Scientific Library
#include <gsl/gsl_cdf.h>       // GNU Scientific Library
#include <gsl/gsl_randist.h>   // GNU Scientific Library
#include <gsl/gsl_linalg.h>    // GNU Scientific Library
#include <gsl/gsl_matrix.h>    // GNU Scientific Library
#include <gsl/gsl_statistics.h>// GNU Scientific Library
#include <gsl/gsl_blas.h>      // GSL Basic Linear Algebra
#define pi 3.14159265358979323846

// Reading Matrices ////////////////////////////////////////////////
int countFileRows(char* filename) {
  // Count number of rows in file "filename"
  FILE *fp;
  int count = 0;  // Line counter (result)
  char c;  // To store a character read from file

  fp = fopen(filename, "r");
  // Extract characters from file and store in character c
  if (fp) { // check file existence
  for (c = getc(fp); c != EOF; c = getc(fp))
      if (c == '\n') // Increment count if this character is newline
          count = count + 1;
  }
  fclose(fp);

  return count;
}

int countFileCols(char* filename, char delim) {
  // Count number of columns in file "filename" (with delimiter delim)
  FILE *fp = fopen(filename, "r");
  char c;
  int count = 0;

  while( c=fgetc(fp) ) {
    if(c == EOF) {
      break; /* break if end of file */
    } else if (c == delim) {
      count += 1; /* otherwise add one to the count */
    } else if (c == '\n') {
      count += 1;
      break;
    }
  }
  fclose(fp);

  return count;
}

void read_csv(char* filename, gsl_matrix* m) {
  // Read a csv into matrix m
  FILE* f = fopen(filename, "r");
  gsl_matrix_fscanf(f, m);
  fclose(f);
}

// Printing Matrix Info ////////////////////////////
void print_vector(gsl_matrix* v, char* filename) {
  FILE* f;
  if (filename == "") {
    f = stdout;
    printf("\n");
  } else {
    f = fopen(filename, "w");
  }

  gsl_vector_fprintf(f,v,"%g");
}

void print_matrix(gsl_matrix* m, char* filename) {
  // print matrix to filename or stdout if filename == ""
  FILE* f;
  if (filename == "") {
    f = stdout;
    printf("\n");
  } else {
    f = fopen(filename, "w");
  }
  int n = m->size1;
  int k = m->size2;
  
  for (int i = 0; i < n; i++) {
    for (int j = 0; j < k; j++) {
      fprintf (f, "%g\t", gsl_matrix_get (m, i, j));
    }
    fprintf (f, "\n");
  }

  if (filename == "") {
    printf("\n");
  } else {
    fclose(f);
  }
}

void print_matrix_dims(gsl_matrix* m) {
  // print dimensions of matrix m
  printf("%d%s%d\n",m->size1,"x",m->size2);
}

// Access Matrix Elements: //////////////////////////////////
void mat_sub(gsl_matrix* X, size_t r1, size_t r2, size_t c1, size_t c2, gsl_matrix* X_new) {
  gsl_matrix_view x = gsl_matrix_submatrix(X,r1,c1,r2-r1+1,c2-c1+1);
  gsl_matrix_memcpy(X_new,&x.matrix);
}

// Random Number Functions: /////////////////////////////////
double runif() {
  // Returns a random number b/w 0 & 1
  return (double) rand() / (double) RAND_MAX;
}

void mvrnorm(gsl_vector* m, gsl_matrix* cholS, gsl_rng* r, gsl_vector* out) {
  // vector addition, vector prod
  // m + cholS * e

  int n = m->size;
  double tmp;
  gsl_vector* e = gsl_vector_alloc(n);

  for (int i=0; i<n; i++) {
    tmp = 0;
    for (int j=0; j<n; j++) {
      if (i==0) gsl_vector_set(e,j,gsl_ran_gaussian(r,1.0));
      tmp += gsl_matrix_get(cholS,i,j) * gsl_vector_get(e,j);
    }
    gsl_vector_set(out,i,gsl_vector_get(m,i) + tmp);
  }

  gsl_vector_free(e);
}


// Linear Algebra Wrappers //////////////////////////////////
// mm: matrix-matrix
// mv: matrix-vector
// ms: matrix-scalar
//  d: double real
// ge: general matrix
void mm_prod(gsl_matrix* A, gsl_matrix* B, double a, double b, gsl_matrix* out) {
}

void mm_add(gsl_matrix* A, gsl_matrix* B, double a, double b, gsl_matrix* out) {
}

void mv_prod(gsl_matrix* A, gsl_vector* x, double a, gsl_vector* out) {
  gsl_blas_dgemv(CblasNoTrans,a,A,x,0.0,out);
}

void vv_add(gsl_vector* x, gsl_vector* y, gsl_vector *out) {
  // sum two vectors
  if (x->size != y->size)
    error("Error in vv_add(): input vectors must have the same dimensionality");
  else if (x->size != out->size || y->size != out->size)
    error("Error in vv_add(): output vector has incorrect dimensions");

  double tmp;
  int n = x->size;

  for (int i=0; i<n; i++) {
    tmp = gsl_vector_get(x,i) + gsl_vector_get(y,i);
    gsl_vector_set(out,i,tmp);
  }
}

double vv_prod(gsl_vector* x, gsl_vector* y) {
  // dot product
  if (x->size != y->size)
    error("Error in vv_prod(): input vectors must have the same dimensionality");

  double out = 0;
  int n = x->size;

  for (int i=0; i<n; i++) {
    out += gsl_vector_get(x,i) * gsl_vector_get(y,i);
  }

  return out;
}


void mat_inv(gsl_matrix * X, gsl_matrix * invX) {
// invert the matrix X
    int signum;
    gsl_matrix * LU = gsl_matrix_calloc(X->size1,X->size2);
    gsl_permutation * p = gsl_permutation_calloc(X->size1);
    gsl_matrix_memcpy(LU,X);

    gsl_linalg_LU_decomp (LU,p,&signum);
    gsl_linalg_LU_invert(LU,p,invX);

    gsl_matrix_free(LU);
    gsl_permutation_free(p);
}

void xxi_m(gsl_matrix* X, gsl_matrix* xxi) {
  int n = X->size1;
  int k = X->size2;
  double dot;
  gsl_matrix* xx = gsl_matrix_alloc(k,k);

  for (int i=0; i<k; i++) {
    for (int j=0; j<=i; j++) {
      dot = 0;
      for (int z=0; z<n; z++) {
        dot += gsl_matrix_get(X,z,i) * gsl_matrix_get(X,z,j);
      }
      gsl_matrix_set(xx,j,i,dot);
      gsl_matrix_set(xx,i,j,dot);
    }
  }

  mat_inv(xx,xxi);
  gsl_matrix_free(xx);
}


int my_gsl_linalg_cholesky_decomp (gsl_matrix * A)
{
  const size_t M = A->size1;
  const size_t N = A->size2;

  if (M != N)
    {
      GSL_ERROR("cholesky decomposition requires square matrix", GSL_ENOTSQR);
    }
  else
    {
      size_t i,j,k;
      int status = 0;

      /* Do the first 2 rows explicitly.  It is simple, and faster.  And
       * one can return if the matrix has only 1 or 2 rows.  
       */

      double A_00 = gsl_matrix_get (A, 0, 0);
      
      double L_00 = sqrt(A_00);
      
      if (A_00 <= 0)
        {
          status = GSL_EDOM ;
        }

      gsl_matrix_set (A, 0, 0, L_00);
  
      if (M > 1)
        {
          double A_10 = gsl_matrix_get (A, 1, 0);
          double A_11 = gsl_matrix_get (A, 1, 1);
          
          double L_10 = A_10 / L_00;
          double diag = A_11 - L_10 * L_10;
          double L_11 = sqrt(diag);
          
          if (diag <= 0)
            {
              status = GSL_EDOM;
            }

          gsl_matrix_set (A, 1, 0, L_10);        
          gsl_matrix_set (A, 1, 1, L_11);
        }
      
      for (k = 2; k < M; k++)
        {
          double A_kk = gsl_matrix_get (A, k, k);
          
          for (i = 0; i < k; i++)
            {
              double sum = 0;

              double A_ki = gsl_matrix_get (A, k, i);
              double A_ii = gsl_matrix_get (A, i, i);

              gsl_vector_view ci = gsl_matrix_row (A, i);
              gsl_vector_view ck = gsl_matrix_row (A, k);

              if (i > 0) {
                gsl_vector_view di = gsl_vector_subvector(&ci.vector, 0, i);
                gsl_vector_view dk = gsl_vector_subvector(&ck.vector, 0, i);
                
                gsl_blas_ddot (&di.vector, &dk.vector, &sum);
              }

              A_ki = (A_ki - sum) / A_ii;
              gsl_matrix_set (A, k, i, A_ki);
            } 

          {
            gsl_vector_view ck = gsl_matrix_row (A, k);
            gsl_vector_view dk = gsl_vector_subvector (&ck.vector, 0, k);
            
            double sum = gsl_blas_dnrm2 (&dk.vector);
            double diag = A_kk - sum * sum;

            double L_kk = sqrt(diag);
            
            if (diag <= 0)
              {
                status = GSL_EDOM;
              }
            
            gsl_matrix_set (A, k, k, L_kk);
          }
        }

      /* Now copy the transposed lower triangle to the upper triangle,
       * the diagonal is common.  
       */
      
      //for (i = 1; i < M; i++)
      //  {
      //    for (j = 0; j < i; j++)
      //      {
      //        double A_ij = 0;//gsl_matrix_get (A, i, j);
      //        gsl_matrix_set (A, j, i, A_ij);
      //      }
      //  } 
      
      if (status == GSL_EDOM)
        {
          GSL_ERROR ("matrix must be positive definite", GSL_EDOM);
        }
      
      return GSL_SUCCESS;
    }
}

void chol(gsl_matrix* X, gsl_matrix* cholS) {
  int n = X->size1;
  gsl_matrix_memcpy(cholS,X);
  my_gsl_linalg_cholesky_decomp(cholS);

  // lower tri stored in cholS.
  // So: mvrnorm <- function(M,S,n=nrow(S)) M + chol(S) %*% rnorm(n)
}