Skip to Main Content

Digital Scholarship and Digital Humanities: Text Analysis

Resources and information for students, researchers and faculty who are incorporating technology into their research, scholarship, and teaching.

Text Analysis

Text Analysis is a broad term for a wide variety of specific analytic methods including parts-of-speech analysis, topic modeling, word frequency and concordances, distant reading, network analysis, data visualizations of unstructured data, and more. Web-based tools like Voyant provide easy access to a range of these techniques. This page is intended to provide some accessible resources for people who are interested in these methods. A more in-depth directory of tools that can be used for specific techniques is available through TAPoR.

Related techniques: Social Media Analysis, Qualitative Analysis

This technique is part of the Analysis activity.

Theory & Methods

General Text Analysis Tools

General Text Analysis

Voyant Tools

free | web-based | easy to learn

A free, web-based tool for general text analysis. Voyant can analyze a single text or a full corpus, and includes a wide variety of document analysis and visualization tools. It encpurages exploration, and has a built in help system to guide understanding of its outputs and visualizations.
See Miriam Posner's workshop for instructors who want to use Voyant in the classroom.

Network Analysis Tools

Network Analysis

WORDij

free for non-commercial use | Windows, Mac, Linux | medium difficulty

WORDij is really a collection of data science tools for unstructured text analysis. The core utility is WordLink, which analyzes bigrams (two-word pairs) and generates data on how these bigrams are related to one another (colocation). Output from this tool can visualized with the built-in VISij tool, or exported to other visualization tools like Gephi.
  • The WORDij download includes a folder called "Documentation," containing basic usage instructions, several tutorials, and some sample data sets
  • Bernhard Rieder created a video tutorial for basic text analysis using WORDij
  • Lianne Lefsrud used WORDij in combination with other tools as part of her analysis of oil sands / tar sands discourse

 

InfraNodus

free if you download yourself; subscription needed for online use | web-based, but can be installed on Windows, Mac and Linux machines | challenging to install; easy to learn

Infranodus visualizes words and co-occurrences for many forms of text, and can be used for text mining, topic modeling, sentiment analysis, and more. Graphs can be built "live" as text is added by copy/paste or even by verbal dictation. Resulting images can be embedded in online media, or exported as static images.

The online version of the tool is available on a subscription basis. The software can also be locally installed by downloading source code from GitHub, but this requires some advanced technical knowledge.

Distant Reading Tools

Distant Reading

Distant Reader

free | web-based | easy to learn

Upload a single text or a corpus to the Distant Reader, and it'll create a "study carrel" for you containing word frequencies, topic models, n-grams, parts-of-speech tagging, and word clouds. The carrel can be downloaded as a ZIP file for local use, or for further work.

Data Cleaning Tools

Data Cleaning

OpenRefine

free | Windows, Mac and Linux downloads | easy to learn

OpenRefine is a very powerful tool for data cleaning and transformation. It also allows you to augment your data with web services (such as looking up addresses to find geolocation data, or reconciling place names and objects with entity references from Wikidata or other linked data sources).

Project Examples

Related Techniques

Social Media AnalysisQualitative Analysis