November 6th 2023 was the first deadline for platforms categorised as “Very Large Online Platforms (VLOPs)” by the EU to deliver their transparency reports. The EU’s new Digital Services Act (DSA) demands VLOPs to publish these twice a year, containing standardized information about the number of users and aggregated numbers of content moderation decisions taken to mitigate potential “societal risks”. In addition to publishing numbers of content moderation decisions, the platforms need to? describe the mechanisms they use to prevent these “risks”. Probably for the first time in history of these tech giants, they are disclosing numbers about their use of automated moderation systems and human moderators. The current data is from April 2023 till September 2023.
Unsurprisingly, automated content moderation takes up the major proportion of all user regulation decisions taken in the area of governance by platforms (Katzenbach, 2021, Gorwa et al., 2020). However, for the very first time one can grasp just how big this proportion is.
We reviewed transparency reports submitted by six platforms, and analyzed actions taken with regard to removing content. Because decisions are largely loosely based on each platform’s own policies (in most cases – community guidelines), there are no uniformed categories to compare, so for the moment, we have compared the aggregated data for the purpose of compatibility.
YouTube, in their report, only indicates “internal or external detection of content”, without specifically identifying automated or manual means of detection. Snapchat also has not provided numbers of automated versus human moderation decisions. Data for Facebook, Instagram, X, TikTok and LinkedIn is aggregated for the period they reported on. Data for X was counted for their moderation decisions based on community guidelines categories. Pinterest described most of their content moderation decisions as “hybrid” ; they are included as “automated” in the following tables and graphs, since the process described relies mostly on automated moderation. The data for Pinterest includes only moderation decisions on’”graphic violence and threats” because they do not provide aggregated data on all categories.
SOCIAL MEDIA PLATFORMS | ||||||
Moderation | X | TikTok | ||||
Automated | 43,870,765 | 75,113,462 | 1,449,607 | 1,800,826 | 2,413,403 | 34,966 |
Human | 2,827,041 | 1,184,951 | 507,506 | 2,199,134 | 408 | 80 |
Total | 46,697,806 | 76,298,413 | 1,957,113 | 3,999,960 | 2,413,811 | 3,5046 |
Table 1. Number of automated content moderation decisions and human content moderation decisions declared by platforms in the first DSA transparency report.
SOCIAL MEDIA PLATFORMS | ||||||
Moderation | X | TikTok | ||||
Automated | 93,95% | 98,45% | 74,07% | 45,02% | 99,98% | 99,77% |
Human | 6,05% | 1,55% | 25,93% | 54,98% | 0,02% | 0,23% |
Total | 100,00% | 100,00% | 100,00% | 100,00% | 100,00% | 100,00% |
Table 1. Number of automated content moderation decisions and human content moderation decisions in % declared by platforms in the first DSA transparency report
Figure 1: Number of automated content moderation decisions and human content moderation decisions in % declared by platforms in the first DSA transparency report
The analysis shows that TikTok stands out for their high reporting of human content moderation, and with 15% X (formerly Twitter) also looks like it has a high human content moderation rate when compared to other platforms, although the volume of moderated posts differs a lot.
How the platforms report numbers on their human content moderators has also varied by platform. Here, in addition to the six VLOPs presented above, we have also analyzed Snapchat data.
For example, Meta reported EU-related numbers of content moderators only by EU member states’ official languages, the same was done by TikTok and YouTube. However, it looks like X also reported global moderators (such as Arabic and Hebrew languages, and Snapchat reported all the global moderators they have).
SOCIAL MEDIA PLATFORMS | |||||||
Language | Meta | YouTube | TikTok | Snapchat | |||
Bulgarian | 20 | 2 | 9 | 69 | 0 | 0 | 0 |
Croatian | 19 | 1 | 24 | 20 | 0 | 0 | 0 |
Czech | 19 | 0 | 31 | 62 | 0 | 0 | 0 |
Danish | 17 | 0 | 9 | 42 | 0 | 23 | 0 |
Dutch | 54 | 1 | 24 | 167 | 0 | 24 | 9 |
English | 109 | 2294 | 15142 | 2131 | 45 | 1011 | N/A |
Estonian | 3 | 0 | 7 | 6 | 0 | 0 | 0 |
Finnish | 15 | 0 | 15 | 40 | 0 | 12 | 0 |
French | 226 | 52 | 176 | 687 | 1 | 250 | 30 |
German | 242 | 81 | 231 | 869 | 2 | 72 | 20 |
Greek | 22 | 0 | 28 | 96 | 0 | 0 | 0 |
Hungarian | 24 | 0 | 25 | 63 | 0 | 0 | 0 |
Irish | 42 | 0 | 0 | 0 | 0 | 0 | 0 |
Italian | 179 | 2 | 91 | 439 | 0 | 18 | 13 |
Latvian | 2 | 1 | 11 | 9 | 1 | 0 | 0 |
Lithuanian | 6 | 0 | 11 | 6 | 0 | 0 | 0 |
Maltese | 1 | 0 | 0 | 0 | 0 | 0 | 0 |
Polish | 65 | 1 | 99 | 208 | 0 | 4 | 10 |
Portugese | 58 | 41 | 464 | 75 | 4 | 36 | 33 |
Romanian | 35 | 0 | 34 | 167 | 0 | 4 | 0 |
Slovak | 11 | 0 | 5 | 44 | 0 | 0 | 0 |
Slovenian | 9 | 0 | 15 | 45 | 0 | 0 | 0 |
Spanish | 163 | 20 | 507 | 468 | 7 | 70 | 31 |
Swedish | 21 | 0 | 16 | 108 | 1 | 21 | 0 |
Arabic | 12 | 529 | |||||
Hebrew | 2 | 1 | |||||
Hindi | 29 | ||||||
Indonesian | 7 | ||||||
Japanese | 4 | ||||||
Mandarin | 4 | ||||||
Norwegian | 32 | ||||||
Punjabi | 15 | ||||||
Russian | 10 | ||||||
Tagalog | 5 | ||||||
Tamil | 7 | ||||||
Turkish | 7 | ||||||
Ukrainian | 3 |
Table 3: Language proficiency of human moderators indicated by social media platforms in DSA Transparency reports.
Figure 2: Pareto chart (the most common languages known by content moderators as declared by all the included platforms). Language proficiency of content moderators on major social media platforms as declared in DSA Transparency reports in October 2023.
Figure 3: Stacked chart of language proficiency of content moderators on major social media platforms (English excluded) as declared in DSA Transparency reports in October 2023.
It is clear from the data, that content moderators on social media platforms lack skills in certain languages, although it varies from one platform to another. However, some of the languages (e.g. Maltese) are absent from almost all of them, and some are underrepresented. In addition, DSA needs to include provisions for non-official languages because one can not understand how the content in, for example, Russian, Arabic or Hebrew is moderated on VLOPs in the EU.
Cover Photo Credits: Marvin Meyer / Unsplash