Skip to Main Content

Digital Scholarship and Digital Humanities: Social Media Analysis

Resources and information for students, researchers and faculty who are incorporating technology into their research, scholarship, and teaching.

Social Media Analysis

Facebook, Twitter, Reddit, Instagram, TikTok and other online platforms have created whole new avenues for research. "Social Media Analysis" can be seen as the intersection of text analysis and network analysis. The "text" is often in the form of discussions, and may include images, emoji, and other media. The "media" is often structured in the form of a network, where the relationships between people and ideas become the focus of study. Information presented here is geared towards helping you collect social media data, and includes tools tailored for analyzing and visualizing data from specific social media platforms. For a more in-depth look at data sources you may want to use for your research, see our Data Services guide.

Related techniques: Text Analysis, Qualitative Analysis, Web Scraping

This technique is part of the Analysis activity.

Theory & Methods

Twitter Analysis Tools

Twitter Analysis

TAGS: the Twitter Archiving Google Sheet

free | data collection only | web-based, using Google Sheets | easy to use

TAGS is an easy to use template, based on Google Sheets, for collecting Twitter data. You can specify any search term you'd like, including Boolean operators, and the utility will gather Twitter records for you (one time or on an ongoing basis). The result is a spreadsheet full of Twitter data you can analyze in any way you choose.

 

Social Media Lab's Netlytic

free, with paid upgrades available | data collection & basic analysis | web-based |  easy to use

This is a community-supported Twitter analytics tool, created by members of the Ryerson University Social Media Lab. The interface is designed to step you through the work of collecting and analyzing Twitter data; there are several built-in visualizations to give you insights. Searches run every 15 minutes and collect up to 1000 tweets at a time. Raw data can be exported in CSV format.

 

Python package: twarc

free | Python knowledge required | data collection only | output is in JSON format

twarc is a command line tool created by the Documenting the Now project. They have also created a catalog of public tweet ID data sets and a tool called Hydrator desktop application for turning tweet IDs into full tweets with metadata.

Reddit Analysis Tools

Reddit Analysis

Social Media Lab's Communalytic

free for educational use; "pro" accounts available | data collection & basic analysis | web-based | comfort with GitHub, text editor and command-line required | output in CSV format

Collect data from Reddit's subreddit communities and analyze it directly within this tool, or export it for analysis elsewhere. Free accounts allow three datasets, and can collect up to 7 days of conversation.

 

Bernhard Rieder's Reddit-Tools

free | data collection only | PHP required | comfort with GitHub, text editor and command-line required | output in CSV format

A collection of PHP scripts (command line and some programming knowledge required) for collecting Reddit conversation data and saving it in CSV format.

 

Python package: PRAW

free | data collection only | Python knowledge required

PRAW is a Python package (library) designed to facilitate access to the Reddit Application Programming Interface (API). You must be familiar with Python in order to use it, but it makes it easy to collect and store data from Reddit (aaaand much more)

Tools

YouTube Analysis

Netlytic

free, with paid upgrades available | data collection & analysis | web-based | easy to use

This is a community-supported social media analytics tool, created by members of the Ryerson University Social Media Lab. The interface is designed to step you through the work of collecting and analyzing data. For YouTube, you can harvest the comments from a single YouTube video and then feed that data into Netlytic's analysis tools or export it for use elsewhere. Note that YouTube datasets collected on the Netlytic site will only be kept for 30 days, so be sure to export them.

General Web Scraping Tools

Web Scraping Tools

Data Miner

free plan limited to 500 pages/month | data collection only | web-based, using browser extension

Data Miner is a commercial service for web scraping that uses a browser extension as its primary interface. Many popular websites have existing templates you can use without doing any work, but building a customized scraper to pull data from tables is very straightfoward. See this video overview and "how it works" page.

 

Octoparse

free plan limited to 10k records per export | data collection only | Windows only

Octoparse uses desktop software in conjunction with a large set of pre-configured templates to enable web scarping from web sites, social media platforms, and more. You can also build a custom web scraper using visually-oriented tools. Data can be exported to CSV and Excel formats.

 

Morph.io

free | data collection only | programming required if not using existing data sets

Morph.io interfaces with GitHub to facilitate the creation and sharing of scripts for data scraping in Python, Node.js, PHP, Ruby, and Perl. The system sets up a basic GitHub project for you, with a template in your chosen language. You then customize it (programming expertise required) to perform a specific data scraping task. Once a scraper has been built for a specific data source, it is added to a searchable directory. The site currently lists more than 10,000 publicly-available scrapers and data sets.

Data Cleaning Tools

Data Cleaning

OpenRefine

free | Windows, Mac and Linux downloads | easy to learn

OpenRefine is a very powerful tool for data cleaning and transformation. It also allows you to augment your data with web services (such as looking up addresses to find geolocation data, or reconciling place names and objects with entity references from Wikidata or other linked data sources).

Project Examples

Related Techniques