2020 Jul 14

# Virtual Room 1: Spatial Analysis

## Date:

Tuesday, July 14, 2020, 12:00pm to 1:30pm

### Chair: Neal Beck (New York University)

Co-Host: Sophie Borwein (University of Toronto)

## Causal Inference for Policy Diffusion

### Discussant: Yiqing Xu (Stanford University)

Understanding why governments adopt policies and how policy innovations diffuse from one government to others is a central goal in all subfields of political science. Despite numerous methodological developments in the policy diffusion literature, unfortunately, fundamental issues of causal inference have been left unaddressed for decades. As a result, little is known about which substantive findings in the literature have causal interpretations. To improve causal inferences in policy diffusion studies, we make three contributions. First, we define a variety of causal effects relevant to policy diffusion questions and clarify assumptions required for causal identification. Second, we provide a general estimation method by extending the standard event history analysis commonly used in practice. Finally, we propose a sensitivity analysis method that can assess the potential influence of unmeasured confounding on causal conclusions. We illustrate the general applicability of the proposed approach using a diffusion study of abortion policies. Open-source software will be made available for implementing our methods.

## Network Event History Analysis for Modeling Public Policy Adoption with Latent Diffusion Networks

### Discussant: Shahryar Minhas (Michigan State University)

Research on the diffusion of public policies across jurisdictional units has long identified the choices made by neighboring units as a key external determinant of policy adoption. Diffusion network inference is a recently-developed methodology that identifies latent, dynamic networks connecting units based on repeated adoption decisions, rather than shared borders or other similarities. Based on the current state-of-the-art, diffusion network inference must be conducted using analytical tools that are separate from the main empirical methods for studying public policy adoption---discrete-time event history models. We offer two contributions that address the disconnect between models for network inference and models for policy adoption. First, we introduce Network Event History Analysis (NEHA)---a modeling framework that incorporates inference regarding latent diffusion pathways into the conventional model used for discrete-time event history analysis. Second, with an extensive application to the study of policy adoption in the American states, we evaluate the role of inferred networks in shaping states' decisions to adopt. Focusing on the literature on policy diffusion in the American states, we replicate a published model of policy adoption, updating it to incorporate diffusion network structure. We evaluate differences in covariate effects, and consider whether the incorporation of networks improves model. We conclude that NEHA is a valuable method for incporporating diffusion networks into the study of public policy diffusion.

2020 Jul 14

# Virtual Room 2: Experimental Designs

## Date:

Tuesday, July 14, 2020, 12:00pm to 1:30pm

### Chair: Justin Esarey (Wake Forest University)

Co-Host: Anwar Mohammed (McMaster University)

## Using Eye-Tracking to Understand Decision-Making in Conjoint Experiments

### Discussant: Anton Strezhnev (New York University)

Conjoint experiments enjoy increasing popularity in political and social science, but there is a paucity of research on respondents' underlying decision-making processes. We leverage eye-tracking methodology and a conjoint experiment, administered to a subject pool consisting of university students and local community members, to examine how respondents process information when completing conjoint surveys. Our study has two main findings. First, we find a positive correlation between attribute importance measures inferred from the stated choice data and attribute importance measures based on eye movement. This validation test supports the interpretation of common conjoint metrics, such as Average Marginal Component Effects and marginal R2 values, as valid measures of attribute importance. Second, when we experimentally increase the number of attributes and profiles in the conjoint table, respondents on average view a larger absolute number of cells but a smaller fraction of the total cells displayed, and the patterns in which they search between cells change conditionally. At the same time, however, their stated choices remain remarkably stable. This overall pattern speaks to the robustness of conjoint experiments and is consistent with a bounded rationality mechanism. Respondents can adapt to complexity by selectively incorporating relevant new information to focus on the important attributes, while ignoring less relevant information to reduce the cognitive processing costs. Together, our results have implications for both the design and interpretation of conjoint experiments.

## Analyze the Attentive and Bypass Bias: Using Mock Vignettes in Survey Experiments

### Discussant: Erin Hartman (UCLA)

Respondent inattentiveness threatens to undermine experimental studies. In response, researchers incorporate measures of attentiveness into their analyses, yet often in a way that risks introducing post-treatment bias. We offer a new, design-based technique—mock vignettes (MVs)—to overcome these interrelated challenges. MVs feature content substantively similar to that of experimental vignettes in political science, and are followed by factual-question checks (MVCs) to gauge respondents’ attentiveness to the MV. Crucially, the same MV is viewed by all respondents prior to the experiment. Across five separate studies, we find that MVC performance is positively associated with (1) other attentiveness measures, as well as (2) stronger treatment effects. Researchers can thus use MVC performance to re-estimate treatment effects, allowing for hypothesis tests that are more robust to respondent inattentiveness yet also safeguarded against post-treatment bias. Lastly, our study offers researchers a set of ready-made, empirically-validated MVs for their own experiments.

2020 Jul 14

# Virtual Room 3: Sample Selection

## Date:

Tuesday, July 14, 2020, 12:00pm to 1:30pm

### Chair: Ludovic Rheault (University of Toronto)

Co-Host: Regan Johnston (McMaster University)

## How You Ask Matters: Interview Requests as Network Seeds

### Discussant: Jennifer Bussell (University of California, Berkeley)

When recruiting interview subjects is the goal, building rapport is conventionally heralded as the superior method. Cold-emails, in contrast, are often dismissed as inferior for their low response rate. Our study suggests that this stance is mistaken. When it is elites who are to serve as interview subjects, we argue that cold-emails can yield tremendous benefits that have thus far been overlooked. More specifically, we posit that when paired with network effects, which are rooted in the linkages among elites, cold-emails can outperform the standard but costly interview solicitation method of building rapport with subjects. In a series of experiments and simulations, we show that small changes to the wording of cold-emails translates into greater network coverage, thereby offering researchers a richer set of insights from their interview subjects.

## How Much Should You Trust Your Power Calculation Results? Power Analysis as an Estimation Problem

### Discussant: Clayton Webb (University of Kansas)

With the surge of randomized experiments and the introduction of pre-analysis plans, today’s political scientists routinely use power analysis when designing their empirical research. An often neglected fact about power analysis in practice, however, is that it requires knowledge about the true values of key parameters, such as the effect size. Since researchers rarely possess definitive knowledge of these parameter values, they often rely on auxiliary information to make their best guesses. For example, survey researchers commonly use pilot studies to explore alternative treatments and question formats, obtaining effect size estimates to be used in power calculations along the way. Field experimentalists often use evidence from similar studies in the past to calculate the minimum required sample size for their proposed experiment. Common across these practices is the hidden assumption that uncertainties about those often empirically obtained parameter values can safely be neglected for the purpose of power calculation.

In this paper, we show that such assumptions are often consequential and sometimes dangerous. We propose a conceptual distinction between two types of power analysis: empirical and non-empirical. We then argue that the former should be viewed as an estimation problem, such that their properties as estimators (e.g., bias, sampling variance) can be formally quantified and investigated. Specifically, we analyze two commonly used variants of empirical power analysis – power estimation and minimum required sample size (MRSS) estimation – asking how reliable these analyses can be under scenarios resembling typical empirical applications in political science. The results of our analytical and simulationbased investigation reveal that these estimators are likely to perform rather poorly in most empirically relevant situations. We offer practical guidelines for empirical researchers on when to (and not to) trust power analysis results.

2020 Jul 14

# Virtual Room 4: Text-as-Data

## Date:

Tuesday, July 14, 2020, 12:00pm to 1:30pm

### Chair: Suzanna Linn (Penn State University)

Co-Host: Justin Savoie (University of Toronto)

## Embedding Regression: Models for Context-Specific Description and Inference in Social Science

### Discussant: Max Goplerud (University of Pittsburgh)

Political scientists commonly seek to make statements about how a word's usage and meaning varies over contexts---whether that be time, partisan identity, or some other document-level covariate. A promising avenue are "word embeddings" that are specific to a domain, and that simultaneously allow for statements of uncertainty and statistical inference. We introduce the "a la Carte on Text embedding regression model" (ConText regression model) for this exact purpose. In particular, we extend and validate a simple model-based method of "retrofitting" pre-trained embeddings to local contexts that requires minimal input data and out-performs well-known competitors for studying changes in meaning across groups and times. Our approach allows us to speak descriptively of "effects" of covariates on the way that words are understood, and to comment on whether a particular use is statistically significantly different to another. We provide experimental and observational evidence of performance of the model, along with open-source software.

2020 Jul 15

# Virtual Room 1: Data Access

## Date:

Wednesday, July 15, 2020, 12:00pm to 1:30pm

### Chair: Suzanna Linn (Penn State University)

Co-Host: Anwar Mohammed (McMaster University)

## Statistically Valid Inferences from Privacy Protected Data

### Discussant: James Honaker (Harvard University)

Unprecedented quantities of data that could help social scientists understand and ameliorate the challenges of human society are presently locked away inside companies, governments, and other organizations, in part because of worries about privacy violations. We address this problem with a general-purpose data access and analysis system with mathematical guarantees of privacy for individuals who may be represented in the data and statistical validity guarantees for researchers seeking population-level insights from it. We build on the standard of "differential privacy" but, unlike most such approaches, we also correct for the serious statistical biases induced by privacy-preserving procedures, provide a proper accounting for statistical uncertainty, and impose minimal constraints on the choice of data analytic methods and types of quantities estimated. Our algorithm is easy to implement, simple to use, and computationally efficient; we also offer open source software to illustrate all our methods.

## Hidden in Plain Sight? Detecting Electoral Irregularities Using Statutory Results

### Discussant: Walter Mebane (University of Michigan)

Confidence in election results is a central pillar of democracy, but in many developing countries, elections are plagued by a number of irregularities. Such problems can include incredible vote margins, sky-high turnout, and statutory forms that indicate manual edits. Recent scholarship has sought to identify these problems and use them to quantify the magnitude of election fraud in countries around the world. In this paper, we argue that this literature suffers from its reliance on an ideal election as a baseline case, and delineate an alternative data-generating process for the irregularities often seen in developing democracies: benign human error. Using computer vision and deep learning tools, we identify statutory irregularities for each of 30,000 polling stations in Kenya’s 2013 presidential election. We show that these irregularities are uncorrelated with election outcomes and do not reflect systematic fraud. Our findings suggest that scholars of electoral integrity should take care to ensure that their methods are sensitive to context and account for the substantial challenges of administering elections in developing democracies.

2020 Jul 15

# Virtual Room 2: Causal Inference

## Date:

Wednesday, July 15, 2020, 12:00pm to 1:30pm

### Chair: Kenichi Ariga (University of Toronto)

Co-Host: Ilayda Onder (Penn State University)

## Casual Inference, or How I Learned to Stop Worrying and Love Hypothesis Testing

### Discussant: Luke Keele (University of Pennsylvania)

Many social scientists now consider it necessary for an empirical research design to achieve identification of a causal relationship as defined by the Rubin (1974) causal model or the closely related Pearl (2009) model. In this paper, we argue that no empirical estimand can take a meaningful causal interpretation without a supporting theoretical structure, even if that estimand is strongly identified by a careful research design; that is, an identified research design is necessary but not sufficient for a causal inference. An atheoretical estimand might be causal'' in the narrow sense that changes in the dependent variable are ascribable to the treatment in the specific data used in the study, but not in the sense of providing predictive or explanatory guidance for treatment effects in any other situation in the past, present, or future. For instance, when objects of study strategically interact with one another the straightforward application of common causal inference research designs will yield misleading results. To summarize our argument, there can be NO CAUSATION WITHOUT EXPLANATION.

## Variation in impacts of letters of recommendation on college admissions decisions: Approximate balancing weights for treatment effect heterogeneity in observational studies

### Author(s): Eli Ben-Michael, Avi Feller and Jesse Rothstein

Assessing treatment effect variation in observational studies is challenging because differences in estimated impacts across subgroups reflect both differences in impacts and differences in covariate balance. Our motivating application is a UC Berkeley pilot program for letters of recommendation in undergraduate admissions: we are interested in estimating the differential impacts for under-represented applicants and applicants with differing a priori probability of admission. We develop balancing weights that directly optimize for “local balance” within subgroups while maintaining global covariate balance between treated and control populations. We then show that this approach has a dual representation as a form of inverse propensity score weighting with a hierarchical propensity score model. In the UC Berkeley pilot study, our proposed approach yields excellent local and global balance, unlike more traditional weighting methods, which fail to balance covariates within subgroups. We find that the impact of letters of recommendation increases with the predicted probability of admission, with mixed evidence of differences for under-represented minority applicants.

2020 Jul 15

# Virtual Room 3: Panel and Spatial Analysis

## Date:

Wednesday, July 15, 2020, 12:00pm to 1:30pm

### Chair: Ludovic Rheault (University of Toronto)

Co-Host: Regan Johnston (McMaster University)

## A Bayesian Method for Modeling Dynamic Network Influence With TSCS Data

### Discussant: Matthew Blackwell (Harvard University)

With fast accumulations of network data, modeling time-varying network influence is necessary and important to relax the unrealistic constant-effect assumption and to deepen our understanding of the changing dynamic between networks and social behavior. However, even in static settings, the identification of network influence remains a challenging problem due to the complicated entanglement of network interdependence, homophily (selection), and common shocks. To identify and explain dynamic network influence, this paper proposes a multilevel Spatio-Temporal model with a multifactor error structure. Network influence is allowed to vary, and network structural features could enter the group-level regression and further explain the variation. The multifactor term is included to capture unobserved time-varying homophily and heterogeneous time trends. We apply Bayesian shrinkage for factor-selection to achieve sufficient bias-correction and avoid overfitting. The Bayesian Spatio-Temporal model is highly flexible and can accommodate a wide variety of network types. Monte Carlo experiments show the model performs well in recovering the true trajectory of network influence. Besides, the varying-influence specification actually helps identification and is robust to misspecification. The two empirical IR studies find interesting patterns of the time-varying influence of the migration flow network on terrorist attacks and the GATT/WTO institutional network on trade policies, which could inspire hypothesis-development and shed light on theoretical debates. An R package is developed for implementing the proposed method.

2020 Jul 15

# Virtual Room 4: Applications

## Date:

Wednesday, July 15, 2020, 12:00pm to 1:30pm

### Chair: John Londregan (Princeton University)

Co-Host: Mikaela Karstens (Penn State University)

## The Political Ideologies of Organized Interests: Large-Scale, Social Network Estimation of Interest Group Ideal Points

### Discussant: In Song Kim (MIT)

Interest group influence is pervasive in American politics, impacting the function of every branch. Core to the study of interest groups, both theoretically and empirically, is the ideology of the group, yet relatively little is known on this front for the vast expanse of them. By leveraging ideal point estimation and network science, we provide a novel measure of interest group ideology for nearly 15,000 unique groups across 95 years, which provides the largest and longest measure of interest group ideologies to date. We make methodological and measurement contributions using exact matching and hand-validated fuzzy string matching to identify amicus curiae signing organizations who have given political donations and then impute and cross-validate ideal points for the organizations based on the network structure of amicus cosigning. Our empirical investigation provides insights into the dynamics of interest group macro-ideology, ideological issue domains and ideological differences between donor and non-donor organizations.

## All (Mayoral) Politics is Local?

### Discussant: Jonathan Nagler (New York University)

One of the defining characteristics of modern politics in the United States is the increasing nationalization of elite- and voter-level behavior. Relying on measures of electoral vote shares, previous research has found evidence indicating a significant amount of state-level nationalization. Using an alternative source of data – the political rhetoric used by mayors, state governors, and Members of Congress on Twitter – we examine and compare the amount of between-office nationalization throughout the federal system. We find that gubernatorial rhetoric closely matches that of Members of Congress but that there are substantial differences in the topics and content of mayoral speech. These results suggest that, on average, American mayors have largely remained focused on their local mandate. More broadly, our findings suggest a limit to which American politics has become nationalized – in some cases, all politics remains local.

2020 Jul 16

# Virtual Room 1: Machine Learning

## Date:

Thursday, July 16, 2020, 12:00pm to 1:30pm

### Chair: Suzanna Linn (Penn State University)

Co-Host: Anwar Mohammed (McMaster University)

## Experimental Evaluation of Computer-Assisted Human Decision Making: Application to Pretrial Risk Assessment Instrument

### Discussant: Jonathan Mummolo (Princeton University)

Despite an increasing reliance on computerized decision making in our day-to-day lives, human beings still make highly consequential decisions. As frequently seen in business, healthcare, and public policy, recommendations produced by statistical models and machine learning algorithms are provided to human decision-makers in order to guide their decisions. The prevalence of such computer-assisted human decision making calls for the development of a methodological framework to evaluate its impact. Using the concept of principal stratification from the causal inference literature, we develop a statistical methodology for experimentally evaluating the causal impacts of machine recommendations on human decisions. We also show how to examine whether machine recommendations improve the fairness of human decisions. We apply the proposed methodology to the randomized evaluation of a pretrial risk assessment instrument (PRAI) in the criminal justice system. Judges use the PRAI when deciding which arrested individuals should be released and, for those ordered released, the corresponding bail amounts and release conditions. We analyze how the PRAI influences judges’ decisions and impacts their gender and racial fairness.

## Improving Variable Importance Measures

### Discussant: Santiago Olivella (University of North Carolina at Chapel Hill)

Boosting and random forests are among the best off-the-shelf prediction tools. These methods offer a variable importance measure (VIM), which is a cumulative measure of the improvement in accuracy over the algorithm. We show existing variable importance measures, as implemented, are biased, returning positive scores on irrelevant variables. Intuitively, if a variable is irrelevant but correlates with a relevant variable, this correlation may lead to an improvement in performance may be misattributed to the irrelevant variable. We introduce a method that removes this bias. The method works by separating each predictor into a component explained by other predictors (a “predicted variable”), and a component not (a “partialed out variable”). We assess variable importance only through any improvement attributable to the latter. We prove the method returns a valid VIM, meaning it is mean-zero and asymptotically normal for irrelevant variables. Simulation evidence and applications to UCI data suggest the method also performs favorably relative to several existing machine learning methods in terms of predictive accuracy.

2020 Jul 16

# Virtual Room 2: Panel and Spatial Analysis

## Date:

Thursday, July 16, 2020, 12:00pm to 1:30pm

### Chair: Justin Esarey (Wake Forest University)

Co-Host: Md Mujahedul Islam (University of Toronto)

## How Wide is the Ethnic Border?

### Discussant: Florian Hollenbach (Texas A&M University)

We explore the relationship between ethnic heterogeneity and within- and cross-country barriers to trade. We develop a spatial model of trade in which observable productivity shocks directly affect local prices. These local shocks propagate through the trading network differentially, depending on unobserved trading frictions. Coupling data describing monthly commodity prices in 227 cities across 42 African counties, remotely sensed weather data, and spatial data describing the locations of ethnic-group homelands, we estimate this model to quantify the costs traders incur when by crossing ethnic and national borders. We show that ethnic borders induce a friction approximately half the magnitude of national borders, indicating that ethnic heterogeneity is an impediment to the development of efficient national markets. Through counterfactual experiments, we quantify the effect of these frictions on consumer welfare and the extent to which colonial-era political borders have hindered African economic integration. In all, our paper suggests that trade impediments caused by ethnic heterogeneity are a substantial channel through which ethnic fractionization impacts development.

## The Dynamics of Civil Wars: A Bayesian hidden Markov model applied to the pattern of conflict and the role of ceasefires (Cancelled)

### Discussant: Bruce Desmarais (Penn State University)

Why (and when) do small conflicts become big wars? We develop a Bayesian hidden Markov modeling (HMM) framework for the studying the dynamics of violence in civil wars. The key feature of an HMM for studying such a process is that an it is defined on top of a latent state space constructed to represent the domain scientist intuition for the processes being studied. To learn a latent state space of varying intensity of conflict we use count data of weekly conflict related deaths over time in a nation as an emitted response variable, and construct an autoregressive model of order 1 to describe its evolution. Using event-level data for all civil wars from 1989 to the present, this framework allows us to study transitions in the latent intensity, e.g. from escalating conflicts to stable and/or deescalating conflicts. In particular, we examine the effect of declaring a ceasefire on the underlying dynamics of conflict. Accounting for the effects of covariates for the relative degree of democracy, GDP per capita, and population in a country, we are able to quantify the uncertainty for the underlying intensity of a conflict at any given point in time.

2020 Jul 16

# Virtual Room 3: Experimental Designs

## Date:

Thursday, July 16, 2020, 12:00pm to 1:30pm

### Chair: Ludovic Rheault (University of Toronto)

Co-Host: Regan Johnston (McMaster University)

## Elements of External Validity: Framework, Design, and Analysis

### Discussant: Daniel Hopkins (University of Pennsylvania)

External validity of randomized experiments has been a focus of long-standing methodological debates in the social sciences. However, in practice, discussions of external validity often differ in their definitions, goals, and assumptions, without making them clear. Moreover, while many applied studies recognize it as their potential limitations, unfortunately, few studies have explicit designs or analysis aimed towards externally valid inferences. In this article, we propose a framework, design, and analysis to address two central goals of external validity inferences — (1) assess whether the direction of causal effects is generalizable (sign-validity), and (2) generalize the magnitude of causal effects (effect-validity). First, we propose a formal framework of external validity to decompose it into four components, X-, Y-, T-, and C-validity (units, outcomes, treatments, and contexts) and clarify the source of potential biases. Second, we present assumptions required to make externally valid causal inferences, and we propose experimental designs to make such assumptions more plausible. Finally, we introduce a multiple-testing procedure to address sign-validity and general estimators of the population causal effects for the effect-validity. We illustrate our proposed methodologies through three applications covering field, survey, and lab experiments.

2020 Jul 16

# Virtual Room 4: Text and Image Data

## Date:

Thursday, July 16, 2020, 12:00pm to 1:30pm

### Chair: Kevin Quinn (University of Michigan)

Co-Host: Ilayda Onder (Penn State University)

## Untangling Mixtures in Judicial Opinions

### Discussant: Brandon Stewart (Princeton University)

Within the small deliberative voting body of the U.S. Supreme Court, prior work has regularly looked to the voting alignments of the justices in order to understand bargaining power and who wins or loses. However, little consensus has emerged over who exerts control over the Court's actual output--judicial opinions. Rather, three distinct perspectives have emerged: (1) The opinion author controls opinion content; (2) the median member of the court controls opinion content; (3) the median of the majority coalition controls the content of the opinion. In this paper, by contrast, we look for evidence of influence where it theoretically operates -- in the opinions themselves -- and identify the unique contributions of separate justices to the collective output of the Court. To do so, we build on a well-established tradition within text-as-data research on the challenge of authorship attribution, and develop a novel authorship model that leverages writing characteristics to predict the authors of individual sentences. More specifically, we introduce an approach that evaluates authorship across multiple competitive models, utilizing uniquely'' authored decisions (i.e., dissenting opinions authored by a single justice) in concert with a series of deep learning models to estimate authorship probabilities by sentence. The approach we introduce to estimating authorship and relative contributions is well-suited for the study of bargaining power and, to that end, we provide a preliminary analysis of the composition of Supreme Court majority opinions where we deduce the justices most responsible for opinion content.

## Protest Event Data from Images

### Discussant: Michelle Torres (Rice University)

Creating event data has been an active area of research since at least the 1970s and continues to represent a fruitful research program at the forefront of many methodological developments. Relying on newspapers, the dominant source of text for event data, introduces structural barriers, however, that images, especially from social media, can ameliorate. Certain quantities of interest, like violence and protester characteristics, are difficult to measure with text, and text introduces known biases. These biases generate fewer events than geolocated images from social media, and images emphasize different components of protests than newspapers. Image data generates new or improved measures of an event's magnitude (continuously valued measures of protester and state violence as well as the size of a protest) and provides information on participant demographics, across more cities and days. The strengths and weaknesses of these two approaches is explored by comparing three newspaper-based event datasets with one generated from geolocated images shared on social media, for Venezuela from 2014-2015, Chile in 2019, and Iraq in 2019.

2020 Jul 16

# Virtual Room 1: Hierarchical Models

## Date:

Thursday, July 16, 2020, 2:30pm to 4:00pm

### Chair: Ludovic Rheault (University of Toronto)

Co-Host: Mikaela Karstens (Penn State University)

## Fast and Accurate Estimation of Non-Nested Binomial Hierarchical Models Using Variational Inference

### Discussant: Justin Grimmer (Stanford University)

Estimating non-linear hierarchical models can be computationally burdensome in the presence of large datasets and many non-nested random effects. Popular inferential techniques may take hours to fit even relatively straightforward models. This paper provides two contributions to scalable and accurate inference. First, I propose a new mean-field algorithm for estimating logistic hierarchical models with an arbitrary number of non-nested random effects. Second, I propose “marginally augmented variational Bayes” (MAVB) that further improves the initial approximation through a post-processing step. I show that MAVB provides a guaranteed improvement in the approximation quality at low computational cost and induces dependencies that were assumed away by the initial factorization assumptions. I apply these techniques to a study of voter behavior. Existing estimation took hours whereas the algorithms proposed run in minutes. The posterior means are well-recovered even under strong factorization assumptions. Applying MAVB further improves the approximation by partially correcting the under-estimated variance. The proposed methodology is implemented in an open source software package.

2020 Jul 16

# Virtual Room 2: Causal Inference

## Date:

Thursday, July 16, 2020, 2:30pm to 4:00pm

### Chair: Justin Esarey (Wake Forest University)

Co-Host: Md Mujahedul Islam (University of Toronto)

## Retrospective causal inference via elapsed time-weighted matrix completion, with an evaluation on the effect of the Schengen Area on the labour market of border regions

### Discussant: James Bisbee (Princeton University)

We propose a strategy of retrospective causal inference in panel data settings where (1) there is a continuous outcome measured before and after a single binary treatment; (2) there exists a group of units exposed to treatment during a subset of periods (switch-treated) and group of units always exposed to treatment (always-treated), but no group that is never exposed to treatment; and (3) the elapsed treatment duration, z, differs across groups. The potential outcomes under treatment for the switch-treated in the pre-treatment period are missing and we impute these values via nuclear-norm regularized least squares using the observed (i.e, factual) outcomes. The imputed values can be interpreted as the counterfactual outcomes of the switch-treated had they been always-treated. Differencing the counterfactual outcomes from the factual outcomes can be interpreted as the effect of not having assigned treatment to the switch-treated in the pre-treatment period. A possible complication for our strategy arises when the evolution of the potential outcomes under treatment for the two groups might not be only influenced by calendar time, but also by z. The latter is particularly important if the treatment effect takes time before stabilizing in a new “steady state” equilibrium. We address this problem by weighting the loss function of the matrix completion estimator so that more weight is placed on the loss for factual outcomes with higher values of z. We apply the proposed strategy to study the impact of the visa policy of the Schengen Area on the labour market of border regions. We first aggregate over 2.2 million individual labour market decisions from the Eurostat Labour Force Survey to the region-level for regions always-treated and switch-treated by the policy during the period of 2004 to 2018. We then estimate the effect of not implementing the policy on the probability of working in any bordering region for switch-treated regions. Preliminary results indicate the share of the labour market working in bordering regions would have been about 0.5% larger had the switch-treated regions adopted the policy prior to 2008.

## A Negative Correlation Strategy for Bracketing in Difference-in-Differences with Application to the Effect of Voter Identification Laws on Voter Turnout

### Discussant: Fredrik Sävje (Yale University)

The method of difference-in-differences (DID) is widely used to study the causal effect of policy interventions in observational studies. DID exploits a before and after comparison of the treated and control units to remove the bias due to time-invariant unmeasured confounders under the parallel trends assumption. Estimates from DID, however, will be biased if the outcomes for the treated and control units evolve differently if counterfactually in the absence of treatment, namely the parallel trends assumption is violated due to history interacting with groups. We propose a new identification strategy that leverages two groups of control units whose outcome dynamics bound the outcome dynamics for the treated group if in the absence of treatment, and achieves partial identification of the average treatment effect for treated. The identified set is of a union bounds form that previously developed partial identification inference methods do not apply to. We develop a novel bootstrap method to construct uniformly valid confidence intervals for the identified set and the treatment effect of interest, and we establish the theoretical properties. We develop a simple falsification test and sensitivity analysis for the assumption. We apply the proposed methods to an application on the effect of voter identification laws on turnout, and we find evidence that the voter identification laws in Georgia and Indiana increased turnout.

2020 Jul 16

# Virtual Room 3: Panel Data

## Date:

Thursday, July 16, 2020, 2:30pm to 4:00pm

### Chair: Suzanna Linn (Penn State University)

Co-Host: Ilayda Onder (Penn State University)

## Bayesian Causal Inference With Time-Series Cross-Sectional Data: A Dynamic Multilevel Latent Factor Model with Hierarchical Shrinkage

### Discussant: Neal Beck (New York University)

This paper proposes a Bayesian causal inference method based on estimating posterior predictive distributions of counterfactuals with TSCS data. To construct the prediction model, we fully take advantage of the flexibility of multilevel modeling and Bayesian model specification to reduce dependence on modeling assumptions. We start with a multilevel dynamic factor model and adopt a Bayesian Lasso-like hierarchical shrinkage strategy for stochastic model-specification selection. Counterfactual imputation based on the posterior predictive distribution generalizes the classic synthetic control approach by assigning observation-specific weights to features of the treated units and exploiting high-order relationships between treated and control time series. With empirical posterior distributions of counterfactuals, it is convenient and intuitive to make causal inferences on estimands defined at the individual and aggregate levels. The proposed approach is applied to simulated data and two empirical examples as in ADH (2015) and Xu (2017). The applications illustrate that, compared to alternative approaches, our method has better counterfactual prediction performance and lower uncertainty and accordingly improves causal inference with TSCS data.

## A Nonparametric Bayesian Model for Gradual Structural Changes: The Intergenerational Chinese Restaurant Processes

### Discussant: Mark Pickup (Simon Fraser University)

Many social changes occur gradually over time, and social scientists are often interested in such changes of unobserved heterogeneity. However, existing methods for estimating structural changes have failed to model continuous processes through which a data generating process evolves. This paper proposes a novel nonparametric Bayesian model to flexibly estimate changing heterogeneous data generating processes. By introducing a time dynamic to the Dirichlet process mixture model, the proposed intergenerational Chinese restaurant process (IgCRP) model categorizes units into groups and allows the group memberships to evolve as a Markov process. In the IgCRP, the group assigned to a unit in a time period follows the standard Chinese restaurant process conditional on the group assignments in the previous time period. A distinctive feature of the proposed approach is that it models a process in which multiple groups emerge and diminish as a continuing process rather than a one-time structural change. The method is illustrated by reanalyzing the data set of a study on the evolution of party positions on civil rights in the United States from the 1930s to the 1960s.

2020 Jul 16

# Virtual Room 4: Instrumental Variables

## Date:

Thursday, July 16, 2020, 2:30pm to 4:00pm

### Chair: Jonathan N. Katz (California Institute of Technology)

Co-Host: Justin Savoie (University of Toronto)

## An omitted variable bias framework for sensitivity analysis of instrumental variables

### Discussant: Jacob Montgomery (WUSTL)

We develop an omitted variable bias framework for sensitivity analysis of instrumental variable (IV) estimates that is immune to "weak instruments," naturally handles multiple "side-effects" and "confounders," exploits expert knowledge to bound sensitivity parameters, and can be easily implemented with standard software. In particular, we introduce sensitivity statistics for routine reporting, such as robustness values for IV estimates, describing the minimum strength that omitted variables need to have to change the conclusions of a study. We show how these depend upon the sensitivity of two familiar auxiliary estimates–the effect of the instrument on the treatment (the "first-stage") and the effect of the instrument on the outcome (the "reduced form")–and how an extensive set of sensitivity questions can be answered from those alone. Next, we provide tools that fully characterize the sensitivity of point-estimates and confidence intervals to violations of the standard IV assumptions. Finally, we offer formal bounds on the worst damage caused by these violations by means of comparisons with the explanatory power of observed variables. We illustrate our tools with several examples.

## Noncompliance and instrumental variables for 2k factorial experiments

### Discussant: Teppei Yamamoto (MIT)

Factorial experiments are widely used to assess the marginal, joint, and interactive effects of multiple concurrent factors. While a robust literature covers the design and analysis of these experiments, there is less work on how to handle treatment noncompliance in this setting. To fill this gap, we introduce a new methodology that uses the potential outcomes framework for analyzing 2k factorial experiments with noncompliance on any number of factors. This framework builds on and extends the literature on both instrumental variables and factorial experiments in several ways. First, we define novel, complier-specific quantities of interest for this setting and show how to generalize key instrumental variables assumptions. Second, we show how partial compliance across factors gives researchers a choice over different types of compliers to target in estimation. Third, we show how to conduct inference for these new estimands from both the finite-population and superpopulation asymptotic perspectives. Finally, we illustrate these techniques by applying them to two field experiments—one on the effects of cognitive behavioral therapy on crime and the other on the effectiveness of different forms of get-out-the-vote canvassing. New easy-to-use, open-source software implements the methodology.

2020 Jul 17

# Virtual Room 1: Covariate Balancing

## Date:

Friday, July 17, 2020, 12:00pm to 1:30pm

### Chair: Ludovic Rheault (University of Toronto)

Co-Host: Anwar Mohammed (McMaster University)

## Balancing covariates in randomized experiments using the Gram-Schmidt Walk

### Discussant: Marc Ratkovic (Princeton University)

The paper introduces a class of experimental designs that allows experimenters to control the robustness and efficiency of their experiments. The designs build on a recently introduced algorithm in discrepancy theory, the Gram--Schmidt walk. We provide a tight analysis of this algorithm, allowing us to prove important properties of the designs it produces. These designs aim to simultaneously balance all linear functions of the covariates, and the variance of an estimator of the average treatment effect is shown to be bounded by a quantity that is proportional to the loss function of a ridge regression of the potential outcomes on the covariates. No regression is actually conducted, and one may see the procedure as regression adjustment by design. The class of designs is parameterized so to give experimenters control over the worse case performance of the treatment effect estimator. Greater covariate balance is attained by allowing for a less robust design in terms of worst case variance. We argue that the trade-off between robustness and efficiency is an inherent aspect of experimental design. Finally, we provide non-asymptotic tail bounds for the treatment effect estimator under the class of designs we describe.

## Kpop: A kernel balancing approach for reducing specification assumptions in survey weighting

### Discussant: Luke W. Miratrix (Harvard University)

Response rates to surveys have declined precipitously. For example, Pew Research Center saw response rates to telephone surveys fall from roughly one third of respondents in the late 1990s, to only 6% in 2018. Some researchers have responded by relying more heavily on convenience-based internet samples. This leaves researchers asking not if, but how, to weight survey results to represent their target population. Though practitioners often call upon expert knowledge in constructing their auxiliary vector, X, to use in weighting methods, they face difficult, feasibility-constrained choices of what interactions or other functions to include in X. Most approaches seek weights on the sampled units that make measured covariates have the same mean in the sample as in the population. However, the weights that achieve equal means on X will ensure that an outcome variable of interest Y is correctly reweighted only if the expectation of Y is linear in X, an unrealistic assumption. We describe kernel balancing for population reweighting (KPOP) to make samples more similar to populations on the distribution of X, beyond the first moment margin. This approach effectively replaces X with a kernel matrix, K, that encodes high-order information about X via the “kernel trick”. We then reweight the sampled units so that their average row of K is approximately equal to that of the population, working through a spectral decomposition. This produces good calibration on a wide range of smooth functions of X, without relying on the user to select those functions. We describe the method and illustrate its use in reweighting political survey samples, including from the 2016 American presidential election.

2020 Jul 17

# Virtual Room 2: Conjoint Designs

## Date:

Friday, July 17, 2020, 12:00pm to 2:15pm

### Chair: Peter J. Loewen (University of Toronto)

Co-Host: Md Mujahedul Islam (University of Toronto)

## Avoiding Measurement Error Bias in Conjoint Analysis

### Discussant: Naoki Egami (Columbia University)

Conjoint analysis is a survey research methodology spreading fast across the social sciences and marketing due to its widespread applicability and apparent capacity to disentangle many causal effects with a single survey experiment. Unfortunately, conjoint designs are also especially prone to measurement error, revealed by surprisingly low levels of intra-coder reliability, which can exaggerate, attenuate, or give incorrect signs for causal effect estimates. We show that measurement error bias is endemic in applications, and so assuming its absence, as many studies implicitly do, is not defensible. With replications of prior research and new experiments, we demonstrate three common mechanisms that generate measurement error. We use these mechanisms to design open source software to help researchers design conjoint experiments, study the effects of measurement error, and correct for the resulting biases.

## Improving Preference Elicitation in Conjoint Designs using Machine Learning for Heterogeneous Effects

### Discussant: Jasjeet Sekhon (University of California, Berkeley)

Conjoint analysis has become a standard tool for preference elicitation in political science. However the typical estimand, the Average Marginal Component Effect (AMCE), is only tangentially linked to theoretically relevant quantities. In this paper we clarify the necessary theoretical assumptions to interpret the AMCE in terms of individual preferences, explain how heterogeneity in marginal component effects can drive misleading conclusions about preferences, and provide a set of tools based on the causal/generalized random forest method (Athey et al., 2019; Wager & Athey, 2018) that allow applied researchers to detect effect heterogeneity between respondents and derive theoretically relevant quantities of interest from estimates of individual-level marginal component effects. We illustrate this method with an application to a recently conducted conjoint experiment on candidate preferences in the 2020 U.S. Democratic Presidential primary.

## Using Conjoint Experiments to Analyze Elections: The Essential Role of the Average Marginal Component Effect (AMCE)

### Discussant: Kosuke Imai (Harvard University)

Political scientists have increasingly deployed conjoint survey experiments to understand multi-dimensional choices in various settings. We begin with a general framework for analyzing voter preferences in multi-attribute elections using conjoints. With this framework, we demonstrate that the Average Marginal Component Effect (AMCE) is well-defined in terms of individual preferences and represents a central quantity of interest to empirical scholars of elections: the effect of a change in an attribute on a candidate or party's expected vote share. This property holds irrespective of the heterogeneity, strength, or interactivity of voters' preferences and regardless of how votes are aggregated into seats. Overall, our results indicate the essential role of AMCEs for understanding elections, a conclusion buttressed by a corresponding literature review. We also provide practical advice on interpreting AMCEs and discuss how conjoint data can be used to estimate other quantities of interest to electoral studies.

2020 Jul 17

# Virtual Room 3: Text-as-Data & Item Response Theory

## Date:

Friday, July 17, 2020, 12:00pm to 1:30pm

### Chair: Suzanna Linn (Penn State University)

Co-Host: Regan Johnston (McMaster University)

## Troubles in/with Text: Combining qualitative and NLP approaches to analyzing government archives from the UK Troubles in Northern Ireland

### Discussant: Arthur Spirling (New York University)

Natural language processing (NLP) offers a range of tools to test conceptual trends in large text corpora. However, NLP scholars typically developed and trained models on carefully curated data, focusing on settings where natural sources of annotation lead to straightforward application of supervised learning. Can NLP tools be used on complicated and imperfect real-world data—e.g., data that is idiosyncratic to a specific place, time, and purpose; and /or data that has undergone imperfect optical character recognition digitization processes—to address complex political science questions? To begin answering this methodological question, this research combines qualitative and computational NLP approaches to understand how UK government officials internally justified their decisions to intern un-convicted Irish Catholics without trial during the “Troubles in N.Ireland.” We retrieved, digitized, and hand-annotated more than 8,500 recently declassified archive documents from government correspondence files (1969-73). We use these data to A) explore tradeoffs associated with various approaches to analyzing complex, un-curated text (qualitative process-tracing, bag-of-word-based topic models, NLP machine-learning classifications, and word and sentence embeddings); and B) model a gold standard for integrating qualitative coding and inductive, expert-based qualitative examinations of model outputs into machine-learning analyses.