# Virtual Room 4: Text and Image Data

## Date:

Thursday, July 16, 2020, 12:00pm to 1:30pm

### Chair: Kevin Quinn (University of Michigan)

Co-Host: Ilayda Onder (Penn State University)

## Untangling Mixtures in Judicial Opinions

### Discussant: Brandon Stewart (Princeton University)

Within the small deliberative voting body of the U.S. Supreme Court, prior work has regularly looked to the voting alignments of the justices in order to understand bargaining power and who wins or loses. However, little consensus has emerged over who exerts control over the Court's actual output--judicial opinions. Rather, three distinct perspectives have emerged: (1) The opinion author controls opinion content; (2) the median member of the court controls opinion content; (3) the median of the majority coalition controls the content of the opinion. In this paper, by contrast, we look for evidence of influence where it theoretically operates -- in the opinions themselves -- and identify the unique contributions of separate justices to the collective output of the Court. To do so, we build on a well-established tradition within text-as-data research on the challenge of authorship attribution, and develop a novel authorship model that leverages writing characteristics to predict the authors of individual sentences. More specifically, we introduce an approach that evaluates authorship across multiple competitive models, utilizing uniquely'' authored decisions (i.e., dissenting opinions authored by a single justice) in concert with a series of deep learning models to estimate authorship probabilities by sentence. The approach we introduce to estimating authorship and relative contributions is well-suited for the study of bargaining power and, to that end, we provide a preliminary analysis of the composition of Supreme Court majority opinions where we deduce the justices most responsible for opinion content.

## Protest Event Data from Images

### Discussant: Michelle Torres (Rice University)

Creating event data has been an active area of research since at least the 1970s and continues to represent a fruitful research program at the forefront of many methodological developments. Relying on newspapers, the dominant source of text for event data, introduces structural barriers, however, that images, especially from social media, can ameliorate. Certain quantities of interest, like violence and protester characteristics, are difficult to measure with text, and text introduces known biases. These biases generate fewer events than geolocated images from social media, and images emphasize different components of protests than newspapers. Image data generates new or improved measures of an event's magnitude (continuously valued measures of protester and state violence as well as the size of a protest) and provides information on participant demographics, across more cities and days. The strengths and weaknesses of these two approaches is explored by comparing three newspaper-based event datasets with one generated from geolocated images shared on social media, for Venezuela from 2014-2015, Chile in 2019, and Iraq in 2019.

Add to Calendar