So, here I am at my new desk, at my new job, in lovely Los Angeles… and in the spirit of new beginnings I have decided to take up blogging. I reckon (does anybody say that anymore) that most of my blogs will be about statistics, economics, marketing and writing code to make them happen.

This one is about a great new package just released for R called Rstan. The package provides a simple inline interface to Stan which takes BUGS like code, translates it into C++, compiles and loads the dynamic library into R and runs your MCMC for you (phew!) (BTW: The guts are based on the inline, Rcpp and RcppEigen packages that are fantastic as well!) What’s more relevant for applied researchers like me is that the algorithms used are cutting edge and use modified HMC coupled with Automatic Differentiation to achieve rather quick mixing. The particular implementation is based on the Hoffman and Gelman No-U-Turn (NUTS) sampler.

To test things out I decided to implement a simple Hierarchical Linear model (or a Linear Mixed Model) with a single outcome Y and a single covariate X. To make things interesting I chose to adopt an unbalanced panel setup, so different “households” are observed for different lengths of time. The model is simply

$$ y_{it} = \beta_{0i} + \beta_{1i} x_{it} + \varepsilon_{it}.$$

I use **H** to represent the number of households and assume (as is typical) that the $\beta$ are generated from Normal distributions (independent for now, in a later blog I’ll discuss relaxing this assumption.). The R code to generate data is below…

# Set Seed set.seed(1234) # Number of Households H=100 # True parameters true.b.mu = c(10,-5) true.b.sig = c(3,2) true.y.sig = 3 # Storage b = matrix(0,100,2) ctr = 0 # Each "household" can have between 5 and 10 reps Ti = sample(5:10,H,replace=T) id = x = y = matrix(0,sum(Ti),1) # Simulate Data for(i in 1:100) { b[i,1] = rnorm(1,true.b.mu[1],true.b.sig[1]) b[i,2] = rnorm(1,true.b.mu[2],true.b.sig[2]) for(j in 1:Ti[i]) { ctr = ctr + 1 x[ctr] = runif(1)*3 + 1 y[ctr] = b[i,1]+b[i,2]*x[ctr]+rnorm(1,0,true.y.sig) id[ctr] = i } } |

So, there we have it. The data is generated and we can move to doing some analysis. Continue reading