Text and Data Mining Now Available at HathiTrust Research Center

We recently received news about HathiTrust Research Center’s services, which has been developing services and tools allowing researchers to employ text and data mining methodologies using the HathiTrust collection. To date, this service has been available only on the portion of the collection that is out of copyright. With the development of a landmark HathiTrust policy and an updated release of HTRC AnalyticsHTRC now provides access to the text of the complete 16.7-million-item HathiTrust corpus for non-consumptive research, such as data mining and computational analysis, including items protected by copyright.

This extraordinary opportunity to use copyrighted materials for non-consumptive research purposes expands research access to the entire HathiTrust digital collection, which is sustained by HathiTrust’s 140+ member libraries. Your community may access HTRC’s easy-to-use computational tools ideal for beginners, as well as more complex tools to meet advanced data analysis needs.
HTRC Algorithms: a set of tools for assembling collections of digitized text from the HathiTrust corpus and performing text analysis on them. Including copyrighted items for ALL USERS.

Extracted Features Dataset: dataset allowing non-consumptive analysis on specific features extracted from the full text of the HathiTrust corpus. Including copyrighted items for ALL USERS.

HathiTrust+Bookworm: a tool for visualizing and analyzing word usage trends in the HathiTrust corpus. Including copyrighted items for ALL USERS.

HTRC Data Capsule: a secure computing environment for researcher-driven text analysis on the HathiTrust corpus. All users may access public domain items. Access to copyrighted items is available ONLY to member-affiliated researchers.For more information, visit the HathiTrust Research Center wiki.