Skip to Main Content

Digital Scholarship and Digital Humanities: Web Crawling and Archiving

Resources and information for students, researchers and faculty who are incorporating technology into their research, scholarship, and teaching.

Web Crawling and Archiving

Information on the web is fragile: links break, pages get taken down, and some content is designed to be ephemeral. The digital nature of online content makes it easy to disappear, leaving no record behind for anyone else to examine.

The resources in this section of the guide are designed to help with two things:

  1. Understand tools that can be used to store copies on online content that can be viewed later
  2. Use tools for finding and using these web archives as a data sources for all kinds of research

This section of the guide is still being developed, but will be available soon!

Data Sources

Theory & Methods

Web Crawling & Archiving Tools

Web Recorder Desktop

free | Windows, MacOS, and Linux | easy to learn

This application is a desktop-software colleague of the web-based Conifer tool. It stores archives locally, can capture some interactive elements and mobile versions of websites, and can even be adapted to capture hidden sites on the Tor network (dark web). The application can also replay existing web archives (WARC files).

Analyzing Web Archives

Project Examples

Related Techniques