Lab Platform Governance, Media and Technology (PGMT)

Current research access programs to social media data: what they offer, what are the limitations?

Review of YouTube, Meta, and TikTok researcher access.

Photo by Piotr Cichosz on Unsplash

In lieu of the enforcement of the Digital Services Act in the EU, which would supposedly make data access easier for researchers, this blog post reviews current data access programs for the academics. What kind of data do platforms allow to access and by whom?  Read a detailed description of the programs by the YouTube, Meta and TikTok as of May 2023 

The Twitter Academic API as we know it is dead. It was a great program for academic research that lasted roughly two years.  When I was writing my PhD on Twitter discussions around news in Russian authoritarianism (2019), the sales representative of Twitter quoted a 2000 USD cost for the historical data that I needed.  As a postgraduate student I could not pay for it. Only three years later, in the spring of  2022, it was already possible to gather historical data from 2012 till 2022 of 10 million tweets through the Twitter academic access for the project “Imaginaries of AI” at our Platform Governance, Media and Technology Lab (PGMT) at the University of Bremen. 

Now, in May 2023, the Academic Access page leads to an offer to buy a “Basic” Tier Twitter API access for 100 USD per month, which would let you gather 10 000 tweets per month, a bare minimum for teaching how to handle social media data analysis.  Other than that, enterprise access, which academics are now also offered to subscribe to, offers access for a minimum of 42 000 USD per month. If you try to use the “more resources for researchers” button on Twitter website, you would now see the picture of a poodle and a byline: “​​​​Nothing to see here. Looks like this page doesn’t exist.  Here’s a picture of a poodle sitting in a chair for your trouble” 

There is hope that European Union’s Digital Services Act and its new data access regime for researchers will change that in 2024, forcing Twitter even under Musk’s leadership to offer something more substantial than a picture of a poodle for academia. 

At the moment, however, there are three social media platforms that are offering researcher access to their API. The TikTok program for researchers was made available in February 2023 but to this date it is still limited only to US-based researchers. The YouTube research program is available since 2022, and Meta’s CrowdTangle tool and API for the researchers has been functioning since 2019 but only allows research on limited topics. 

It is important to note that none of those programs allows exploring moderated data that had been taken down. Moreover, YouTube and TikTok make provisions for data sets to be mandatorily updated for the deleted data not to be present in a dataset. Which means that even if a researcher has the data deleted by moderation or otherwise in their dataset, they have to delete it, too, once it is deleted from the platform.  In practice, it makes content moderation research impossible.  CrowdTangle does not show the deleted data at all. 

We have explored the three currently  available social media platforms research programs according to the following scheme:

Data access chart, created by PGMT lab
YouTube researcher program
Who is eligible

By application, researchers from accredited academic institutions, from the following countries: Argentina, Australia, Austria, Belgium, Brazil, Bulgaria, Canada, Chile, Colombia, Cyprus, Czech Republic, Denmark, Egypt, Estonia, Finland, France, Germany, Ghana, Greece, Hungary, India, Indonesia, Ireland, Israel, Italy, Japan, Kenya, Korea, Latvia, Lithuania, Luxembourg, Malaysia, Malta, Mexico, Morocco, Netherlands, New Zealand, Nigeria, Norway, Oman, Philippines, Poland, Portugal, Qatar, Romania, Saudi Arabia, Senegal, Singapore, Slovakia, South Africa, Spain, Sweden, Switzerland, Taiwan, Thailand, Tunisia, United Arab Emirates, United Kingdom, United States, Vietnam.

Topics: 

wide range of topics, by application

Sources

What is available: access to standard YouTube API 

Which Data: Meta-data of channels and videos retrieved from YouTube API v.3. Here is an overview of what data is currently available through YouTube API

How Many: By application, downloading quota can be increased (standard developers quota is 10 000 requests per day, there is no limit officially stated in research program, thus, differs according to application)

Which time period: according to application, one research project

Is the documentation about the sources available and easily accessible? Yes, on YouTube API pages

Filtering

creating sub-corpora: yes

how user-friendly and intuitive is the process: needs some coding skills 

Download options: bulk or single downloads: continuous downloads

Download limits: according to quota decided by YouTube for a specific research project

Processing

cloud computing yes

local machines yes

Results 

Can you export and share

derived data: yes

examples and snippets: no snippets, written permission required

complete research articles: yes, open access is preferable; YouTube needs to see publication 7 days prior 

complete datasets for peer review: no

Continuous access:

datasets: for the duration of the research project; renew datasets every 30 days to ensure updating deleted data

Meta CrowdTangle research program 
Who is eligible

By application, university researchers (faculty, PhD students, post-doctoral research fellows) 

Topics: 

Misinformation

Elections

COVID-19

Racial justice

Well-being

Sources

What is available: access to CrowdTangle tool and CrowdTangle API (which includes Facebook, Instagram and Reddit data)

Which Data: Public posts (excluding posts from individuals) (up to 5000 words), including media (images and videos), metrics, and links. Here is an overview of what data is currently available through CrowdTangle API and CrowdTangle Tool (CrowdTangle tracks data from public content across Facebook Pages and Groups, as well as Verified Profiles and public Instagram accounts. API is needed for large datasets or time series)

How Many: By application,  there is no limit officially stated in research program

Which time period: according to application, one research project per application

Is the documentation about the sources available and easily accessible? Yes, on CrowdTangle tool pages 

Filtering

creating sub-corpora: yes, the tool allows various filtering 

How user-friendly and intuitive is the process: 

the tool does not need coding skills, the API does need some coding skills

Download options

bulk or single downloads: continuous downloads

download limits: in bulks of maximum 10 000 posts per request. You can download a maximum of 10,000 Posts in a CSV from a Dashboard. If you export a CSV from the Historical Data tool (accessible via Dashboards), you can get up to 300,000 posts. You can perform a maximum of 100 download requests per day.

Processing

cloud computing yes

local machines yes

Results 

Can you export and share

derived data: yes

examples and snippets: yes

complete articles: yes, citing CrowdTangle is required

complete datasets for peer review: no

Continuous access

datasets: for the duration of the research project

TikTok research program 
Who is eligible

By application, university researchers from the US only. WIll announce changes soon?

Topics: 

various topics

Sources

What is available: access to TikTok API

Which Data: Meta-data on user profiles and content, including metrics, comments, captions and subtitles; key words search ​​data 

How Many:

By application,  there is no limit officially stated in research program, TikTok assigns the quota

Which time period: 

according to application, one research project per application

Is the documentation about the sources available and easily accessible? 

No, there are a lot of terms of service right now

Filtering

how user-friendly and intuitive is the process: the API does need some coding skills 

Download options

bulk or single downloads: not clear

download limits:not clear

Processing

cloud computing yes

local machines yes

Results 

Can you export and share

derived data: yes

examples and snippets: no

complete articles: yes, TikTok needs to see publication 7 days prior 

complete datasets for peer review: no

Continuous access

datasets: for the duration of the research project; needs to be updated every 15 days

Summarized information on access in the table:


Posted

in

by