Virtual Room 3: Text-as-Data & Item Response Theory

Date: 

Friday, July 17, 2020, 12:00pm to 1:30pm

Troubles in/with Text: Combining qualitative and NLP approaches to analyzing government archives from the UK Troubles in Northern Ireland

Sarah Dreier, Emily Gade, Jose Hernandez, Noah A. Smith and Sofia Serrano

GPIRT: A Gaussian Process Model for Item Response Theory

J. Brandon Duck-Mayr, Roman Garnett and Jacob Montgomery

 

Chair: Suzanna Linn (Penn State University)

 

Co-Host: Regan Johnston (McMaster University)

Troubles in/with Text: Combining qualitative and NLP approaches to analyzing government archives from the UK Troubles in Northern Ireland

Author(s): Sarah Dreier, Emily Gade, Jose Hernandez, Noah A. Smith and Sofia Serrano

Discussant: Arthur Spirling (New York University)

 

Natural language processing (NLP) offers a range of tools to test conceptual trends in large text corpora. However, NLP scholars typically developed and trained models on carefully curated data, focusing on settings where natural sources of annotation lead to straightforward application of supervised learning. Can NLP tools be used on complicated and imperfect real-world data—e.g., data that is idiosyncratic to a specific place, time, and purpose; and /or data that has undergone imperfect optical character recognition digitization processes—to address complex political science questions? To begin answering this methodological question, this research combines qualitative and computational NLP approaches to understand how UK government officials internally justified their decisions to intern un-convicted Irish Catholics without trial during the “Troubles in N.Ireland.” We retrieved, digitized, and hand-annotated more than 8,500 recently declassified archive documents from government correspondence files (1969-73). We use these data to A) explore tradeoffs associated with various approaches to analyzing complex, un-curated text (qualitative process-tracing, bag-of-word-based topic models, NLP machine-learning classifications, and word and sentence embeddings); and B) model a gold standard for integrating qualitative coding and inductive, expert-based qualitative examinations of model outputs into machine-learning analyses.

GPIRT: A Gaussian Process Model for Item Response Theory

Download Paper

Author(s): J. Brandon Duck-Mayr, Roman Garnett and Jacob Montgomery

Discussant: Yuki Shiraito (University of Michigan)

 

The goal of item response theoretic (IRT) models is to provide estimates of latent traits from binary observed indicators and at the same time to learn the item response functions (IRFs) that map from latent trait to observed response. However, in many cases observed behavior can deviate significantly from the parametric assumptions of traditional IRT models. Nonparametric IRT models overcome these challenges by relaxing assumptions about the form of the IRFs, but standard tools are unable to simultaneously estimate flexible IRFs and recover ability estimates for respondents. We propose a Bayesian nonparametric model that solves this problem by placing Gaussian process priors on the latent functions defining the IRFs. This allows us to simultaneously relax assumptions about the shape of the IRFs while preserving the ability to estimate latent traits. This in turn allows us to easily extend the model to further tasks such as active learning. GPIRT therefore provides a simple and intuitive solution to several longstanding problems in the IRT literature.


Add to Calendar