Fake AI 

Edited by Frederike Kaltheuner

Meatspace Press (2021)

Book release: 14/12/2021

This book is an intervention - 

Chapter 12

Why automated content moderation won’t save us

By Andrew Strait

“Limiting resolution means limiting power and its abuses.”

When I was a content moderator at Google in the mid 2010s, there were days I wished a machine could do my job. It was usually on the boring days spent scanning through monotonous copyright complaints with thousands of URLs, or the bad days clearing out queues of child sexual abuse content (they would always refill by the next day).

Content moderation is difficult work, often exciting but occasionally damaging to the soul. The platforms I moderated were (and still are) used by billions of people across the world as libraries, news sources, and public squares, but they were not built for these purposes. Rather, these platforms were designed to incentivise user engagement, content sharing, and mass data collection—objectives that often made moderating harmful content feel like throwing cups of water on a raging inferno. When I did this work, a small number of automated tools helped prioritise and flag certain types of content for review, but human moderators like me were ultimately charged with putting out the daily blazes. Over the years I worked at Google, the copyright complaints grew longer and the queues of traumatising content continued to grow. Surely, I wondered, only the increased use of AI could help moderate these platforms at scale?

Nearly a decade later, the heads of large tech firms and global policymakers share the same vision. In the wake of high-profile incidents like the Christchurch mosque shooting and the Rohingya genocide in Myanmar, policymakers across the globe have called for tech firms to remove hate speech, terrorist content, and other forms of harmful speech at increasingly breakneck speed. In response, tech platforms have invested heavily in AI-based moderation tools to identify, predict, and remove harmful content within minutes—if not at the time of upload. Called before the US Congress in 2018 over election misinformation, Mark Zuckerberg declared that “over the long term, building AI tools is going to be the scalable way to identify and root out most of this harmful content.” Facebook now boasts that 94.7% of hate speech removed from their platform in the third quarter of 2020 was proactively identified using their automated tools. Has the dream of automated moderation come true? Or do statistics like the ones above mask deeper problems with platform moderation?

Sadly, automated moderation tools have become yet another example of the misplaced hope in artificial intelligence to solve complex human problems. By framing online safety as simply a matter of moving from “human to automated”, tech firms and policymakers risk exacerbating known accountability and transparency issues with platform moderation policies while distracting themselves from more urgent questions around platform design features and business incentives. The hype around automated moderation tools has largely overlooked their technical limitations, the hidden labour of human moderators, and the increasing opacity of platform decision-making.

The technical limitations of automated moderation tools are well known to research and civil society communities. Notoriously bad at identifying the nuance and context of online speech, these systems routinely fail to identify whether a video constitutes illegal copyright infringement or lawful parody, or whether a post with a racial slur is written by a victim of a hate crime or their assailant. Due to their reliance on historically labelled content, these tools fail to keep pace with the constant evolution of human language, such as the shifting codewords used in QAnon misinformation campaigns. Some systems exhibit serious issues of language bias—for example, researchers have found Google’s  Perspective API tool, which uses machine learning to predict the “toxicity” of certain content, penalises content written in African American Vernacular English.1 Addressing these issues would require not only a paradigm shift in AI research but a fundamental reconstitution of the labour forces designing these tools to incorporate more diverse perspectives.

The torrent of online misinformation sparked by the Covid-19 pandemic laid these limitations bare like never before. Not trusting contracted moderators to take corporate devices home and work remotely, large platforms like Facebook and YouTube resorted to fully automating many moderation decisions. The results were alarming. Child exploitation and self-harm removals on Facebook fell by at least 40%, pages reporting factual Covid-19 information were misidentified as misinformation, and YouTube appeals of wrongfully removed accounts skyrocketed.2 Rather than improving outcomes, the results highlighted how automated tools can worsen the already inadequate status quo of platform moderation practices.

The decision to jettison human moderators during Covid-19 also reflects the changing role of the moderator as platforms have grown in size and scale. In my time doing this work, most moderators were full-time employees respected as trusted experts in particular regions or products, encouraged to deliberate on tricky edge cases. Today, tech platforms increasingly treat moderators as an expendable and invisible labour source whose decisions are used as fuel to train automated moderation tools. The vast majority of moderators at large platforms today are temporary contract labourers outsourced from third-party agencies, in countries that include India, the Philippines, and Malaysia. They are low-paid, relative to full-time employees of large tech firms, and lack the same career advancement opportunities and access to high-quality mental healthcare that is necessary for the traumatising aspects of this work.3 These moderators are hired, constantly assessed, and fired on the basis of their ability to render decisions quickly (often within seconds) and consistently with rigid platform-wide policies. They are not encouraged to bring deliberative nuance or contextual expertise to a decision, or to question whether a policy that applies to billions of people in hundreds of countries is adequate for the specific complaint they are viewing. It is their hidden emotional labour that keep platforms and automated moderation tools afloat. Yet these moderators remain profoundly undervalued, their welfare and expertise pushed to the side in the rush to moderate at scale.

The rush towards automation also risks further obfuscating already opaque moderation processes, exacerbating platform transparency and accountability issues. As public reliance on platforms has grown, so too has their incontestable power to determine what speech is acceptable online. Freedom of expression is increasingly reliant on secretive and unaccountable business practices. The classified nature of moderation policies makes it virtually impossible to assess whether automated tools are effective at keeping users safe. Platforms have been hesitant to make their policies public, and industry transparency reports and experiments like Facebook’s Oversight Board, an independent committee that reviews a tiny portion of Facebook’s moderation decisions (but importantly, not its content moderation policies or design features) offer narrow, self-selected forms of transparency.

Automated moderation tools risk making this situation far worse. Their decisions are difficult to audit, assess, and understand even for developers, let alone for regulators or third-party researchers who struggle to gain access to them. These tools are often built to meet the needs of different “governance stakeholders”, which may not align with the interests of users or national laws.4 To give one example, the Syrian Archive, an open source initiative to document war crimes in the country, has routinely battled YouTube’s algorithm to disable terrorist content which routinely fails to differentiate between videos glorifying violence and those documenting abuses.5 Leaving the decision of what speech is allowed on the web to black box tools and black box policies, with no independent oversight or assessment, will only further diminish the accountability of large platforms and render the societal cost of automated tools invisible.

Rather than a dream come true, the increasing reliance on automated moderation tools risks becoming another AI nightmare. Platforms and regulators must not be constrained to a narrative of “human vs. automated” moderation. Instead, they must ask a broader set of questions. For example, how can platforms be designed to encourage safe behaviour? Is limiting their size and scale a more promising solution? Rather than following profit-driven metrics like maximising user engagement or increasing content virality, what if platforms redesigned their affordances and features with safety front and centre? Rather than looping moderators in at the last minute to put out fires, what if firms included them in the design of these products from the outset? And rather than exploiting outsourced moderators, further burying their emotional labour, what if tech firms properly compensated and cared for their well-being as full-time employees?

As with many other AI technologies, the hype around automated moderation tools reflects a misplaced belief that complex sociopolitical problems can be adequately resolved through technical solutions. AI is tremendously helpful at addressing well-defined and concrete tasks, but determining what speech is acceptable for billions of people is anything but a well-defined challenge. It is understandable that we wish it were so—how nice would it be to simply let a machine police our speech for us without changing the scale, practices, or affordances of Facebook, Twitter or YouTube?

Andrew Strait is a former Legal Policy Specialist at Google and works on technology policy issues. He holds an MSc in Social Science of the Internet from the Oxford Internet Institute.


1. Sap, M., Card, D., Gabriel, S., Choi, Y. & Smith, N. A. (2019) The Risk of Racial Bias in Hate Speech Detection. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics. DOI: 10.18653/v1/P19-1163

2. Scott, M. & Kayali, L. (2020, October 21) What happened when humans stopped managing social media content. Politico. https://www.politico.eu/article/facebook-content-moderation-automation/

3. Roberts, S. T. (2019) Behind the Screen: Content Moderation in the Shadows of Social Media. New Haven, CT: Yale University Press.

4. Gorwa, R., Binns, R. & Katzenbach, C. (2020) Algorithmic content moderation: technical and political challenges in the automation of platform governance. Big Data & Society, January–June: 1–15, DOI: 10.1177/2053951719897945

5. O’Flaherty, K. (2018, June 26) YouTube keeps deleting evidence of Syrian chemical weapon attacks. WIRED. https://www.wired.co.uk/article/chemical-weapons-in-syria-youtube-algorithm-delete-video

Next: Chapter 13
Consolidating power in the name of progress: techno-solutionism and farmer protests in India

by Tulsi Parida, Aparna Ashok

Instagram        Twitter