Qualitative and Quantitative Studies of Wikipedia (with Aaron Halfaker)
Keynote, ACM International Symposium on Open Collaboration (OpenSym), Paris, France
We reflect on a decade of studying Wikipedia using qualitative and quantitative methods.
See a map of all the places I've given a talk!
Keynote, ACM International Symposium on Open Collaboration (OpenSym), Paris, France
We reflect on a decade of studying Wikipedia using qualitative and quantitative methods.
Panel, Machine Learning and User Experience San Francisco (MLUXSF), San Francisco, California
With the rise of Machine Learning and AI to solve human-focused needs, how do we design and use data science ethically to help empower and support people?
Talk, 2018 European Conference on Computer-Supported Cooperative Work, Nancy, France
Data analytics increasingly relies on open source software (OSS) libraries that extend scripted languages like python and R. Software documentation for these libraries is crucial for people across all experience levels, but documentation work raises many challenges, particularly in open source communities. In this collaboration between ethnographers and data scientists, we discuss the types, roles, practices, and motivations around documentation in data analytics OSS libraries.
Talk, 2018 Annual Conference of the International Communication Association, Prague, Czech Republic
How can institutions that own and operate large-scale social media platforms come to know “their users” at scale? In this talk, I discuss ways of knowing user populations at scale, drawing on Foucault’s account of governmentality, particularly the role of statistics in the formation of the modern nation state.
Talk, University of California at San Diego, The Design Lab, San Diego, California
In this talk, I discuss the role of qualitative and ethnographic methods in relation to computer, information, and data science. These holistic, reflexive, and meta-level approaches to studying data and computation in context help us better understand how to both support and practice data analytics at various scales.
Keynote, Open Science Symposium, Department of Second Language Studies, University of Hawaiʻi at Mānoa, Mānoa, Hawaiʻi
Openness in science is hard to disagree with as an abstract principle, but what exactly do we mean when we call for science to be made open – or more open than before? In this talk, I introduce and unpack the many different goals, strategies, products, values, and assumptions of the broad open science movement.
Talk, IT University of Copenhagen, ETHOSlab, Copenhagen, Denmark
Ethnography is traditionally a qualitative and inductive methodology that is now widely used to holistically investigate people’s lived experiences in and across cultures. In this talk, I define and discuss two ways of thinking about the role of ethnographic methods around computation, then discuss how my research relates to both.
Talk, University of Manchester, Data Science Institute, Manchester, United Kingdom
In this talk, I discuss the role of qualitative and ethnographic methods in relation to computer, information, and data science. These holistic, reflexive, and meta-level approaches to studying data and computation in context help us better understand how to both support and practice data analytics at various scales.
Guest lecture, UC-Berkeley: Human Contexts and Ethics of Data course, Berkeley, California
A guest lecture for Cathryn Carson and Margo Boenig-Liptsin’s course on Human Contexts and Ethics of Data (HIST 182C, STS 100C), focusing on how various publics generate, analyze, and interpret data.
Talk, College of Information Studies, University of Maryland at College Park, College Park, Maryland
Ethnography is traditionally a qualitative and inductive methodology that is now widely used to holistically investigate people’s lived experiences in and across cultures. In this talk, I define and discuss two ways of thinking about the role of ethnographic methods around computation, then discuss how my research relates to both.
Talk, School of Information Sciences, University of Illinois at Urbana-Champaign, Urbana, Illinois
Ethnography is traditionally a qualitative and inductive methodology that is now widely used to holistically investigate people’s lived experiences in and across cultures. In this talk, I define and discuss two ways of thinking about the role of ethnographic methods around computation, then discuss how my research relates to both.
Talk, School of Information and Library Science, University of North Carolina at Chapel Hill, Chapel Hill, North Carolina
Ethnography is traditionally a qualitative and inductive methodology that is now widely used to holistically investigate people’s lived experiences in and across cultures. In this talk, I define and discuss two ways of thinking about the role of ethnographic methods around computation, then discuss how my research relates to both.
Talk, Bay Area Science Festival, Albany, California
Today, “artificial intelligence” seems to be everywhere – in our phones, vacuums, hospitals, and inboxes – but it can be hard to separate science fiction from science fact. Many discussions about AI imagine a fully autonomous superintelligence that designs itself with little to no human intervention, making decisions in ways that humans cannot possibly understand. Yet the work of designing, developing, engineering, training, and testing such systems requires a massive amount of human labor, which is typically erased when such systems are released as products. In this talk, I give a human-centered, behind-the-scenes introduction to machine learning, illustrating the creative, interpretive, and often messy work humans do to make autonomous agents work. Understanding the humanity behind artificial intelligence is important if we want to think constructively about issues of bias, fairness, accountability, and transparency in AI.
Talk, 2017 Annual Meeting of the Association of Internet Researchers, Tartu, Estonia
This paper examines the early history of “anyone can edit” wiki software – originally developed in 1995, six years before Wikipedia’s origin. While today, the idea of a wiki is associated with large-scale, massively-distributed encyclopedic knowledge production, this was not always the case. Articles on pre-Wikipedia wikis were often closer to a Joycean stream of consciousness than Wikipedia’s Britannica-inspired texts that speak in single voice, and the underlying wiki platform lacked many of the affordances that are now taken for granted in wiki platforms. In fact, the creator of the first wiki advised Wikipedia’s co-founders that the goals of creating a general-purpose encyclopedia and a wiki were inherently contradictory.
Guest lecture, UC-Berkeley Department of Statistics: Reproducible and Collaborative Data Science, Berkeley, California
A guest lecture for Fernando Perez’s STAT 159/259 course on Reproducible and Collaborative Data Science, in which I discuss issues of open science and reproducibility around our recent paper Operationalizing conflict and cooperation between automated software agents in Wikipedia: A replication and expansion of ‘Even Good Bots Fight’
Talk, Berkeley Institute for Data Science, Berkeley, California
Ethnography is traditionally a qualitative and inductive methodology – with its origins in cultural anthropology – that is now widely used to holistically investigate people’s lived experiences in and across cultures. In this talk, I define and discuss two ways of thinking about the role of ethnographic methods around computation, then discuss how my research relates to both.
Talk, 2017 Annual Meeting of the Society for the Social Studies of Science (4S), Boston, Massachusetts
An overview of how to study data science ethnographically by personally engaging in various practices of data science.
Talk, JupyterCon, New York, New York
We (Stuart Geiger, Brittany Fiore-Gartland, and Charlotte Cabasse-Mazel) share ethnographic findings made observing and working with Jupyter notebooks, focusing on how people use Jupyter to create and deliver computational narratives in particular local contexts, like classrooms, hackathons, research collaborations, and more.
Panel, The 21st Annual BCLT/BTLJ Symposium, Berkeley, California
This talk is part of a panel session titled “Demystifying Algorithmic Processes: What is the role of algorithms in online platforms, what can they do and not do, and how should they be governed?”
Talk, Annual Meeting of the Society for the Social Study of Science (4S), Barcelona, Spain
Wikipedians rely on software agents to govern the ‘anyone can edit’ encyclopedia project, in the absence of more formal and traditional organizational structures. Lessons from Wikipedia’s bots speak to debates about how algorithms are being delegated governance work in sites of cultural production.
Talk, PyData SF, San Francisco, CA
Wikipedia relies on one of the world’s largest open collaboration communities. Since 2001, the community has grown substantially and faced many challenges. This presentation reviews research and initiatives around community sustainability in Wikipedia that are relevant for many open source projects, including issues of newcomer retention, governance, automated moderation, and marginalized groups.
Talk, SciPy, Austin, Texas
Many open source, volunteer-driven projects begin with a small, tight-knit group of collaborators, but then rapidly expand far faster than anyone expects or plans for. I discuss cases of governance growing pains in Wikipedia, which have many lessons for running open source software projects.
Talk, Communicating with Machines workshop, Fukuoka, Japan
I discuss cases from a multi-year ethnographic study of automated software agents in Wikipedia, where ‘bots’ have fundamentally transformed the nature of the ‘anyone can edit’ encyclopedia project.
Panelist, Annual Meeting of the International Communication Association (ICA), Fukuoka, Japan
This panel extends discusses the potentials and complications of mixed-methods research in big data studies, specifically in cases when population-level data is available.
Talk, Big Data: Critiques and Alternatives workshop, Fukuoka, Japan
I discuss four data-intensive activist projects as "successor systems," discussing the political and epistemological implications of using data to advance activist projects.
Talk, Algorithms, Automation, and Politics workshop, Fukuoka, Japan
I discuss how algorithmic systems are deployed to enforce particular behavioral and epistemological standards in Wikipedia, which can become a site for collective sensemaking among veteran Wikipedians.
Talk, Theorizing the Web, Astoria, New York
Talk, Theorizing the Web, Astoria, New York
Talk, The Hacker Within, BIDS, Berkeley, CA
A tutorial (with Jupyter notebooks) about how to use APIs to query structured data from Wikipedia articles and the Wikidata project.
Talk, Wikipedia 15th Anniversary Birthday Bash, San Francisco, CA
A short talk to open up an event celebrating the 15th anniversary of Wikipedia. The prompt we were given was "Why [x] is my favorite contribution to Wikipedia."
Talk, Annual Meeting of the Society for the Social Study of Science (4S), Denver, CO
I examine the roles that automated software agents (or bots) play in the governance and moderation of Wikipedia, Twitter, and reddit – three online platforms that differently uphold a related set of commitments to ‘open’ and ‘public’ online participation.
Panelist, Crowdsourcing and the Academy Symposium, Berkeley, CA
A panel discussing how academics use crowdsourcing in research.
Talk, Annual Meeting of the Association of Internet Researchers (AoIR), Phoenix, AZ
This presentation introduces bot-based collective blocklists (or blockbots) in Twitter, which have been created to help various groups better moderate their own experiences on the site.
Talk, Annual Meeting of the International Communication Association (ICA), San Juan, Puerto Rico
In this talk, I examine the early history of “anyone can edit” wiki software – originally developed in 1995, six years before Wikipedia’s origin – focusing on the ways in which this technological infrastructure has been repurposed across communities, domains, and scales.
Guest lecture, Social Aspects of Information Systems course, Berkeley, CA
An overview of Wikipedia and other peer production platforms, discussing issues that link up to the theories discussed in the Social Aspects of Information Systems class.
Guest lecture, Social Aspects of Information Systems course, Berkeley, CA
An overview of how various online platforms moderate content, discussing issues that link up to the theories discussed in the Social Aspects of Information Systems class.
Workshop presentation, ISchools Conference, Newport Beach, CA
Workshop presentation, CSCW Workshop on Feminism and Feminist Approaches in Social Computing, Vancouver, BC
Workshop presentation, CSCW Workshop on Ethics for Studying Sociotechnical Systems in a Big Data World, Vancouver, BC
Talk, Berkman Center for Internet and Society, Cambridge, MA
Talk, Human Computation Conference (HCOMP), Citizen-X Workshop, Pittsburgh, PA
We review various crowdsourcing and collective action systems, identifying particular sets of civic values and assumptions.
Talk, Annual Meeting of the Association of Internet Researchers (AoIR), Daegu, South Korea
Talk, Annual Meeting of the Society for the Social Study of Science (4S), Buenos Aires, Argentina
Panelist, Annual Meeting of the International Communication Association (ICA), Seattle, WA
Panelist, Annual Meeting of the International Communication Association (ICA), Seattle, WA
This panel focuses on the challenges faced by researchers conducting mixed-method research into online platforms, particularly where large amounts of data are widely available.
Talk, The Contours of Algorithmic Life, Davis, CA
Talk, Theorizing the Web, Brooklyn, New York
Guest lecture, History of Information, Berkeley, CA
A lecture on the history of Wikipedia, in the broader context of the history of reference works.
Panelist, Robots and New Media, Berkeley, CA
A panel discussing the ethical and political issues that are raised with autonomous robots and software bots.
Talk, Bangkok Scientifique, Bangkok, Thailand
A talk introducing various concepts around large-scale data analysis to a general audience, including spam detection and governmental survellance.
Talk, Annual Meeting of the Association of Internet Researchers (AoIR), Denver, CO
Talk, Annual Meeting of the Society for the Social Study of Science (4S), San Diego, CA
Conference proceedings talk, International Symposium on Wikis and Open Collaboration (WikiSym 2012), Hong Kong
This paper examines what happened when one of Wikipedia's counter-vandalism bots unexpectedly went offline.
Talk, Theorizing the Web, New York, NY
Panelist, ACM Conference on Computer-Supported Cooperative Work (CSCW), San Antonio, TX
Conference proceedings talk, Conference on Computer Supported Cooperative Work, San Antonio, TX
This paper establishes a quantitative metric for measuring editor activity through temporal edit sessions.
Guest lecture, Social Aspects of Information Systems course, Berkeley, CA
An introduction to Actor Network Theory for students in the Masters of Information Management and Systems (MIMS) course
Panelist, International Symposium on Wikis and Open Collaboration (WikiSym 2012), Linz, Austria
Talk, Annual Meeting of the Society for the Social Study of Science (4S), Copenhagen, Denmark
Talk, Infosocial, Evanston, IL
Conference proceedings talk, International Conference on Weblogs and Social Media (ICWSM), Dublin, Ireland
A descriptive study of Wikipedia's highly-automated socialization processes and an A/B test to improve templated messages to newcomers.
Panelist, Conference on Human Factors in Computing (CHI), Austin, Texas
Conference proceedings talk, Conference on Human Factors in Computing (CHI), Austin, Texas
We introduce IP over Xylophone Players (IPoXP), a novel Internet protocol between two computers using xylophone-based Arduino interfaces
Talk, GCOE International Symposium on Informatics Education, Kyoto, Japan
Talk, Annual Meeting of the Society for the Social Study of Science (4S), Cleveland, OH
Talk, Annual Meeting of the Society for the Social Study of Science (4S), Cleveland, OH
Conference proceedings talk, International Symposium on Wikis and Open Collaboration, Mountain View, CA
This paper investigates Wikipedia's article deletion processes, finding that it is heavily populated by specialists.
Talk, Digital Media and Learning (DML), Long Beach, CA
Conference proceedings talk, Hawaii International Conference on System Sciences, Lihue, Hawaii
We detail the methodology of ‘trace ethnography’, which combines the richness of participant-observation with the wealth of data in logs so as to reconstruct patterns and practices of users in distributed sociotechnical systems
Panelist, Wikimania 2010, Gdansk, Poland
A panel intended to foster a dialog between academic researchers who study Wikimedia projects and the Wikimedia community.
Talk, Critical Point of View: Wikipedia and the Politics of Open Knowledge, Amsterdam, the Netherlands
Conference proceedings talk, Conference on Computer Supported Cooperative Work, Savannah, Georgia
This paper traces out a heterogeneous network of humans and non-humans involved in the identification and banning of a single vandal in Wikipedia.
Talk, Critical Point of View: Wikipedia and the Politics of Open Knowledge, Bangalore, India
Talk, Annual Meeting of the Society for the Social Study of Science (4S), Arlington, Virginia
Conference proceedings talk, International Symposium on Wikis and Open Collaboration, Orlando, Florida
A short paper showing the recent explosive growth of automated editors (or bots) in Wikipedia, which have taken on many new tasks in administrative spaces.
Talk, the Second Annual Media Sociology Forum, New York, NY
Talk, First Annual Wikiconference NYC, New York, NY
Talk, Media in Transition 6, Cambridge, MA
Talk, Annual Conference on Science and Technology in Society, Washington, DC
Talk, Annual Wikimedia Conference (Wikimania), Alexandria, Egypt
Talk, Exploring New Media Worlds, College Station, TX