R, Stan and NUTS: A Hierarchical Linear Model Example

So, here I am at my new desk, at my new job, in lovely Los Angeles… and in the spirit of new beginnings I have decided to take up blogging. I reckon (does anybody say that anymore) that most of my blogs will be about statistics, economics, marketing and writing code to make them happen.

This one is about a great new package just released for R called Rstan. The package provides a simple inline interface to Stan which takes BUGS like code, translates it into C++, compiles and loads the dynamic library into R and runs your MCMC for you (phew!) (BTW: The guts are based on the inline, Rcpp and RcppEigen packages that are fantastic as well!) What’s more relevant for applied researchers like me is that the algorithms used are cutting edge and use modified HMC coupled with Automatic Differentiation to achieve rather quick mixing. The particular implementation is based on the Hoffman and Gelman No-U-Turn (NUTS) sampler.

To test things out I decided to implement a simple Hierarchical Linear model (or a Linear Mixed Model) with a single outcome Y and a single covariate X. To make things interesting I chose to adopt an unbalanced panel setup, so different “households” are observed for different lengths of time. The model is simply

$$ y_{it} = \beta_{0i} + \beta_{1i} x_{it} + \varepsilon_{it}.$$

I use H to represent the number of households and assume (as is typical) that the $\beta$ are generated from Normal distributions (independent for now, in a later blog I’ll discuss relaxing this assumption.). The R code to generate data is below…

# Set Seed
set.seed(1234)
 
# Number of Households
H=100
 
# True parameters
true.b.mu = c(10,-5)
true.b.sig = c(3,2)
true.y.sig = 3
 
# Storage 
b =  matrix(0,100,2)
ctr = 0
# Each "household" can have between 5 and 10 reps
Ti = sample(5:10,H,replace=T)
id = x = y = matrix(0,sum(Ti),1)
 
# Simulate Data
for(i in 1:100) {
	b[i,1] = rnorm(1,true.b.mu[1],true.b.sig[1])
	b[i,2] = rnorm(1,true.b.mu[2],true.b.sig[2])	
	for(j in 1:Ti[i]) {
		ctr = ctr + 1
		x[ctr] = runif(1)*3 + 1
		y[ctr] = b[i,1]+b[i,2]*x[ctr]+rnorm(1,0,true.y.sig)
		id[ctr] = i 
	}
}

So, there we have it. The data is generated and we can move to doing some analysis.

First, lets just make sure that a simple linear regression provides results that are in the ballpark (It should since OLS is consistent).

lm(y~x)
 
Call:
lm(formula = y ~ x)
 
Coefficients:
(Intercept)            x  
      9.364       -4.710

So far so good. Now for the Stan code.

hcode = "
data {
	int<lower=0> N; // Observations
	int<lower=0> H; // persons
	int<lower=0> id[N]; // ID variable
	real y[N]; // Y Vector
	real x[N]; // X Vector
}
 
parameters {
	vector[2] beta[H];	// [H,2] dim matrix for b
	real beta_mu[2]; // means for b
	real<lower=0> beta_sig[2];   // var parameters for b
	real<lower=0> y_sig;   // regression variance
}
 
model {
	//Priors
		y_sig~uniform(0,1000); 
 
	for(j in 1:2) {
		beta_mu[j] ~ normal(0,100);
		beta_sig[j] ~ uniform(0,1000);
		for(h in 1:H) 
		beta[h,j] ~ normal(beta_mu[j], beta_sig[j]); 
	}
 
	// Likelihood
	for(n in 1:N) 
		y[n] ~ normal(beta[id[n],1]+beta[id[n],2] .* x[n], y_sig);
 
} 
"

Ok, lets go over this piece by piece. First, notice that hcode is simply a string. That’s because I will be using the model.code argument to pass my code to Stan. An alternative would be to place this code in a file and use the model.file option. Stan code is broken up into blocks – in the code above we have three such blocks data, parameters and of course the model block. Lets look at each.

data {
	int<lower=0> N; // Observations
	int<lower=0> H; // persons
	int<lower=0> id[N]; // ID variable
	real y[N]; // Y Vector
	real x[N]; // X Vector
}

The data block informs Stan of the SEXP objects that we will be passing to it. In this case we have a few constants and a few vectors. That’s it, simple. Now we tell Stan the parameters that we will be “estimating.” All unknown quantities go here, including the individual household level regression parameters. The lower=0 keeps the variances from being initialized using Stan’s default (-2,2) range.

parameters {
	vector[2] beta[H];	// [H,2] dim matrix for b
	real beta_mu[2]; // means for b
	real<lower=0> beta_sig[2];   // var parameters for b
	real<lower=0> y_sig;   // regression variance
}

Lastly, we have the model spec. The first part specifies the priors we would like to use and the latter the likelihood specification. The code below should look familiar to anyone who has used BUGS. The syntax is a bit different but overall quite straightforward. The manual has details (although I would treat it as a beta version too.)

model {
	//Priors
		y_sig~uniform(0,1000); 
 
	for(j in 1:2) {
		beta_mu[j] ~ normal(0,100);
		beta_sig[j] ~ uniform(0,1000);
		for(h in 1:H) 
		beta[h,j] ~ normal(beta_mu[j], beta_sig[j]); 
	}
 
	// Likelihood
	for(n in 1:N) 
		y[n] ~ normal(beta[id[n],1]+beta[id[n],2] .* x[n], y_sig);
 
}

We can now use the inline ‘stan’ command in Rstan to sample from the specified model.

dat = list(N=length(y),H=dim(b)[1],y=y,x=x,id=id)
hlm = stan(model_name="Hierarchical Linear Model", model_code = hcode, data=dat , iter = 2000, n_chains = 2, verbose = FALSE)

And we are off. A note, depending on the version you have installed the syntax in the ‘stan’ function may need to be tweaked. For example, model_code would be model.code, iter would be n.iter and so on. I am told that the next release of Stan/Rstan (v.1.0.1) will correct these and other issues. On my MAC-PRO this code takes about a minute to compile and sample the two chains. The chains mix quickly as we can see from the traceplot and it seems like all is well!

traceplot

In the next blog I will compare Stan to alternative approaches to estimating Multinomial Conditional Logit (ala McFadden) with Unobserved Heterogeneity. Stay tuned.

8 thoughts on “R, Stan and NUTS: A Hierarchical Linear Model Example

  1. Thanks for taking the time to share your code. Unfortunately, I tried to replicate the exact procedure outlined above and I get the following error:

    COMPILING MODEL ‘Hierarchical Linear Model’ FROM Stan CODE TO C++ CODE NOW.
    COMPILING THE C++ CODE FOR MODEL ‘Hierarchical Linear Model’ NOW.
    SAMPLING FOR MODEL ‘Hierarchical Linear Model’ NOW (CHAIN 1).
    Error in sampler$call_sampler(args.list[[i]]) :
    Error in function stan::prob::normal_log(N4stan5agrad3varE): Scale parameter, sigma, is 0:0, but must be > 0!

    Any idea what could be going wrong?

  2. Seems like you have a variance term that hits zero (which is a zero probability event under the prior). See if you altering the prior (to something that is strictly positive) helps.

    • I had just copied and pasted the model as written into R and just changed the stan() function to match up with my version of R so it’s strange that it works for on your mac pro and not on mine. Nevertheless, I tried changing the priors on y_sig and beta_sig to gamma distributions but I still get the same error (as uniform distributions between 0 and 1000 I’m not sure how I get a negative variance in the first place). Everything else checks out, I can run fine the model from the Rstan manual, so I’m at a bit of a loss.

    • page 58 of the Stan manual says:
      “If there are no user-supplied initial values, the unconstrained initial values are generated uniformly from the interval „-2; 2…”.
      Does this mean it’s possible that an initial value for beta_sig could be negative?

      • Looks like that was the issue. I changed the parameters block to prevent that from happening.
        parameters {
        vector[2] beta[H]; // [H,2] dim matrix for b
        real beta_mu[2]; // means for b
        real beta_sig[2]; // var parameters for b
        real y_sig; // regression variance
        }

        • Ian,
          Thanks for the catch. The code I was running was different from what I posted. duh. The blog is now updated with code that explicitly places bounds on the parameters.

  3. Thanks for making this available. If instead of fitting a line to the subjects in each house we wanted to fit a logistic curve, how could one adapt the model do this. This makes sense in my own work because subjects are tested at different points in time and their performance on the test over time is non-linear. So the above model is very nearly what I’d like, except for the likelihood. The function I use is: beta1/(1+exp((beta2-x)/beta3).
    http://rgm2.lab.nig.ac.jp/RGM_results/stats:SSlogis/SSlogis_001_big.png.

    beta1 is the asymptote of the curve, beta2 is the midpoint and beta3 is a rate parameter. My thinking was I’d just have to add two more priors (beta3 and sig3), specify in the parameters block that the betas cannot be negative and then change the likelihood from
    // Likelihood
    for(n in 1:N)
    y[n] ~ normal(beta[id[n],1]+beta[id[n],2] .* x[n], y_sig);
    to
    // Likelihood
    for(n in 1:N)
    y[n] ~ normal(beta[id[n],1]/(1+exp((beta[id[n],2] – x[n])/beta[id[n],3])), y_sig);

    The model compiles but than once I get to the sampling step I get an error:
    “Error in sampler$call_sampler(args.list[[i]]):
    No acceptable small step size could be found. Perhaps the posterior is not continuous?”

    Is there anything obviously wrong with my approach/model?

    • probably because beta[,3] could be small which blows up the likelihood. You might try changing the model to one with a multiplicative scale factor for the logistic. Identification of these nonlinear parameters is always tricky though.

Leave a Reply

Your email address will not be published. Required fields are marked *

*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>