Harikesh Nair, Oystein Daljord (both at Stanford GSB) and I have a new paper titled “Homogenous Contracts for Heterogeneous Agents: Aligning Salesforce Composition and Compensation” that I have posted on my website. The paper examines the complementarities and distortions that arise on account of firms using uniform (across salespeople) features in their sales compensation contracts. What we find is that the presence of such uniform features creates a link across salespeople even in the absence of of explicit complementarities such as team selling or territory overlaps. Further, we find that by carefully pruning the salesforce a firm might be able to reap contractual efficiencies and get close to profit levels that accrue when contracts are completely tailored to individual salespeople. This is a follow up to our earlier paper and comments are most welcome.

# R, Stan and NUTS: A Hierarchical Linear Model Example

So, here I am at my new desk, at my new job, in lovely Los Angeles… and in the spirit of new beginnings I have decided to take up blogging. I reckon (does anybody say that anymore) that most of my blogs will be about statistics, economics, marketing and writing code to make them happen.

This one is about a great new package just released for R called Rstan. The package provides a simple inline interface to Stan which takes BUGS like code, translates it into C++, compiles and loads the dynamic library into R and runs your MCMC for you (phew!) (BTW: The guts are based on the inline, Rcpp and RcppEigen packages that are fantastic as well!) What’s more relevant for applied researchers like me is that the algorithms used are cutting edge and use modified HMC coupled with Automatic Differentiation to achieve rather quick mixing. The particular implementation is based on the Hoffman and Gelman No-U-Turn (NUTS) sampler.

To test things out I decided to implement a simple Hierarchical Linear model (or a Linear Mixed Model) with a single outcome Y and a single covariate X. To make things interesting I chose to adopt an unbalanced panel setup, so different “households” are observed for different lengths of time. The model is simply

$$ y_{it} = \beta_{0i} + \beta_{1i} x_{it} + \varepsilon_{it}.$$

I use **H** to represent the number of households and assume (as is typical) that the $\beta$ are generated from Normal distributions (independent for now, in a later blog I’ll discuss relaxing this assumption.). The R code to generate data is below…

# Set Seed set.seed(1234) # Number of Households H=100 # True parameters true.b.mu = c(10,-5) true.b.sig = c(3,2) true.y.sig = 3 # Storage b = matrix(0,100,2) ctr = 0 # Each "household" can have between 5 and 10 reps Ti = sample(5:10,H,replace=T) id = x = y = matrix(0,sum(Ti),1) # Simulate Data for(i in 1:100) { b[i,1] = rnorm(1,true.b.mu[1],true.b.sig[1]) b[i,2] = rnorm(1,true.b.mu[2],true.b.sig[2]) for(j in 1:Ti[i]) { ctr = ctr + 1 x[ctr] = runif(1)*3 + 1 y[ctr] = b[i,1]+b[i,2]*x[ctr]+rnorm(1,0,true.y.sig) id[ctr] = i } } |

So, there we have it. The data is generated and we can move to doing some analysis. Continue reading