Naijia Liu (Princeton University)
Abstract: Social scientists rely heavily on survey datasets to study important questions, such as policy preferences and voting intentions. However, it is common that respondents choose not to answer a certain question due to some unobserved confounders, thus causing ’missing not at random (MNAR)’ problems. Existing multiple imputation methods cannot resolve the issue due to their assumption of missing at random (MAR). This paper tackles at MNAR issue by modeling the latent structure of the data to capture the unmeasured confounders that cause the missing values. This approach allows one to apply multiple imputation methods by assuming missing at random (MAR) conditional on the latent factor. One can also conduct regression analysis using only the complete cases, with weights estimated from the factorization of the missing pattern. The wide range of latent factor model enables scholar to tailor it to the dataset and the end goal of the analysis. I show an application using latent utility model to impute the missing values in a self-reported ideology question, which is considered to be a sensitive question in the 2017 Chinese Netizen Survey dataset. I also offer discussion on the scope and limitations of the method.