Evan Rosenman and Nitin Viswanathan (Stanford University)
Abstract: We consider a problem of ecological inference, in which individual-level covariates are known, but labeled data is available only at the aggregate level. Our goal is to estimate the parameter vector β in a logistic regression relating covariates to success probabilities. The intended application is modeling voter preferences in elections. We pose the problem as maximizing the likelihood of a Poisson binomial, the distribution of the sum of independent but not identically distributed Bernoulli variables. We prove some new results about the curvature of this likelihood, and propose a computationally efficient method for fitting the coefficient vector, based on a Gaussian approximation. Using data on voters in Morris County, NJ, we demonstrate that this approach outperforms other ecological inference methods in predicting known outcome: whether an individual votes. We apply this technique to the 2016 presidential election, fitting a model to voters from the contested swing state of Pennsylvania. The model is predictive and learns intuitive associations.