Biases We Research By
Biases We Research By is a tutorial on detecting and mitigating understudied forms of bias in NLP research. Its aim is to address existing gaps by providing a comprehensive overview of approaches and methods for conceptualizing and uncovering bias, integrating theoretical insights from the social sciences, actionable methodologies, and perspectives from data labelers themselves. The tutorial has three main objectives:
Clarify the concept of bias as defined in the social sciences. We introduce foundational social-psychological theories, including implicit and explicit biases, the stereotype content model, and intersectionality (Greenwald & Banaji, 1995; Fiske et al., 2002). This section highlights how social categorization processes give rise to cognitive biases and how these subsequently manifest in technological systems.
Present methods to reduce underrepresentation and prevent silencing in NLP resources and systems. We survey existing practices for detecting and mitigating the underrepresentation of vulnerable communities at different stages in the NLP pipeline—including data selection and filtering, annotation workflows, and model development. The overview draws on methodologies from the Semantic Web, gender studies, applied mathematics, social sciences, and NLP.
Discuss issues stemming from the human labor behind fair and ethical NLP systems. Although discussions of bias in NLP often focus on datasets, models, and algorithms, far less attention is paid to the human labor that underlies them. Data labelers, content moderators, and annotators form a global workforce whose often-invisible contributions shape every AI system. This section connects academic discussions of bias with worker realities, showing how structural inequalities, pay disparities, and cultural invisibility introduce another layer of bias—one rooted not in data points, but in labor practices.