Library hours forWednesday, June 17, 2026
Harriet Irving	Unavailable
I.U.C.	Unavailable
Head Hall	Unavailable
Hans W. Klohn	Unavailable
Complete Hours

Text and Data Mining Guide Ask Us

Introduction

Text and Data Mining refers to the computer-aided harvesting and analysis of a corpus of data. A corpus can include the full text of a book or the entire body of an author's work, journal articles, social media posts, census data, and more. The goals of Text and Data Mining activities are to find patterns, discover relationships, and analyze semantics that suggest new meanings.

UNB Libraries can provide assistance with developing a text or data mining project including:

Negotiating licenses for access to resources;
Developing agreements with providers of texts;
Consulting on project planning and tool selection;
Helping with training.

Text Analysis and Mining Tools

There can be a learning curve to using the following tools effectively. Please contact Erik Moore (ecmoore@unb.ca) or Julie Morris (jullie.morris@unb.ca) with questions or for guidance.

Subscribed resources

A Companion to Aesthetics, Second edition
In this extensively revised and updated edition, 168 alphabetically arranged articles provide comprehensive treatment of the main topics and writers in this area of aesthetics.:.; Written by prominent scholars covering a wide-range of key topics in aesthetics and the philosophy of art.; Features revised and expanded entries from the first edition, as well as new chapters on recent developments in aesthetics and a larger number of essays on non-Western thought about art.; Unique to this edition are six overview essays on the history of aesthetics in the West from antiquity to modern times.
Permitted Use | Purchased multi-user unlimited access
A Companion to 20th-Century America
A Companion to 20th-Century America is an authoritative survey of the most important topics and themes of twentieth-century American history and historiography. Written by an expert in the field, each essay assesses the past and current state of American scholarship, covering topics such as foreign policy, religion, labor, ethnicity, law, the military, and the media. Additional essays cover major time periods: from the beginning of the century through the 1930s, 1940s, 1950s, up to the closing of the century. An editorial introduction and further reading lists for each chapter round out this clearly written, exciting overview of twentieth-century American history. Students, scholars, and general readers should find this an indispensable work of reference and source of information.
Permitted Use | Purchased multi-user unlimited access

Free resources

Google Ngram Viewer: https://books.google.com/ngrams
When you enter phrases into the Google Books Ngram Viewer, it displays a graph showing how those phrases have occurred in a corpus of books. Learn more here: https://books.google.com/ngrams/info
OpenRefine: https://openrefine.org
A powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
Voyant Tools: https://voyant-tools.org/
An open-source, web-based application that supports scholarly reading and interpretation of texts or a corpus. Voyant was conceived to enhance reading through lightweight text analytics such as word frequency lists, frequency distribution plots, and KWIC (keyword in context) displays.

Text Corpora with Data Mining Rights

The following resources include text and data mining rights as part of their license agreements, sometimes with conditions. To learn the details of a specific resource's agreement, please contact:

Joanne Smyth, Director, Collections Strategy and Scholarly Communication: jsmyth@unb.ca
Linda Roulston, Electronic Licensing Librarian: lroulsto@unb.ca

Subscribed Resources

17th and 18th century Burney newspapers collection (Gale)
"The newspapers, pamphlets, and books gathered by the Reverend Charles Burney (1757-1817) represent the largest and most comprehensive collection of early English news media. The present digital collection, that helps chart the development of the concept of 'news' and 'newspapers' and the "free press", totals almost 1 million pages and contains approximately 1,270 titles. Many of the Burney newspapers are well known, but many pamphlets and broadsides also included have remained largely hidden. These treasures can now be searched, browsed and discovered again within Gale Digital Collections."
Collection record | Purchased multi-user unlimited access
17th and 18th century Nichols newspapers collection (Gale)
"The 17th and 18th Century Nichols Newspapers Collection features the newspapers, periodicals, pamphlets and broadsheets that form the Nichols newspaper collection held at the Bodleian library in Oxford, UK. All 296 volumes of bound material, covering the period 1672-1737 are presented in digitized format here.

This collection charts the history of the development of the press in England and provides invaluable insight into 17th-18th century England."
Collection record | Purchased multi-user unlimited access
British Library newspapers - Part I, II & IV (Gale)
"Sourced from the extensive holdings of the British Library, British Library Newspapers delivers a wide range of irreplaceable local and regional voices to reflect the social, political, and cultural events of the eighteenth, nineteenth, and twentieth centuries. With more than 160 newspaper titles, the series is comprised of approximately 5.5 million pages of historic content, from articles to advertisements.

UNB Libraries provides access to:
Part I: 1800-1900
Ranging from early tabloids like the Illustrated Police News to radical papers like the Chartist Northern Star, publications in Part I span a vast range of national, regional, and local interests.

Part II: 1800-1900
Part II further expands the range of English regional newspapers and the political views represented in the programme. Researchers can find the newspapers of a number of significant towns and regions included in this collection: Nottingham, Bradford, Leicester, Sheffield, and York, as well as North Wales. The addition of two major London newspapers, The Standard and the Morning Post, helps capture conservative opinion in the nineteenth century, balancing the progressive, more liberal views of the newspapers that appear in Part I.

Part IV: 1732-1950
From key early newspaper titles like the Stamford Mercury to what is possibly the oldest magazine in the world still in publication, the Scots Magazine, Part IV offers key local and regional perspectives."

Purchased multi-user unlimited access
Cambridge Core (eBooks & eJournals)
Cambridge Core provides full text for eJournals in the sciences, social sciences, and humanities, as well as access to selected eBooks purchased by UNB Libraries.
eBooks: Purchased multi-user unlimited access | eJournals: Subscribed multi-user unlimited access
Early English Books Online (EEBO via ProQuest)

EEBO is based on the microfilm collections curated by the Ann Arbor publisher Eugene B. Power (1905-1993). The founder of what became University Microfilms International or UMI, Power’s first foreign project established the microfilming operation at the British Museum in 1942 and, since then, more than 200 libraries worldwide have contributed to the microfilm collection.

Following its digital launch in 1998, Early English Books Online now contains page images of virtually every work printed in England, Ireland, Scotland, Wales and British North America, as well as works in English printed elsewhere between 1473 and 1700.

Purchased multi-user unlimited access
Eighteenth Century Collections Online (ECCO)
A comprehensive digital edition of The Eighteenth Century microfilm set, which has aimed to include every significant English-language and foreign-language title printed in the United Kingdom, along with thousands of important works from the Americas, between 1701 and 1800. Consists of books, pamphlets, broadsides, ephemera. Subject categories include history and geography; fine arts and social sciences; medicine, science, and technology; literature and language; religion and philosophy; law; general reference. Also included are significant collections of women writers of the eighteenth century, collections on the French Revolution, and numerous eighteenth-century editions of the works of Shakespeare. Where they add scholarly value or contain important differences, multiple editions of each individual work are offered. Allows searching Early English Books Online as an option.
English poetry database
The English Poetry Database "contains poems in English from Anglo-Saxon times to the end of the nineteenth century by writers from the British Isles. The database covers the works of 1,257 named poets and many items by different anonymous hands."
Institute of Physics Publishing (IOP)
The Institute of Physics Publishing promotes research and the advancement of knowledge in Physics and Physics-related fields. This resource provides access to eBooks, as well as current and archival journal content.
JSTOR Archival Collections
JSTOR provides access to back issues of a variety of scholarly journals. UNB Libraries currently subscribes to the Arts & Sciences (I through X) collections, along with the Life Sciences and Ireland collections.
Permitted Use | Subscribed multi-user unlimited access
JSTOR eBooks
Books at JSTOR offers more than 35,000 ebooks from renowned scholarly publishers, integrated with journals and primary sources on JSTOR's easy-to-use platform. UNB subscribes to selected eBook titles.

eBook Collection title lists:
- Books at JSTOR Open Access
- JSTOR eBooks - Selected Titles (purchased)
JSTOR Open Access (eBooks & Archival eJournals)
JSTOR Open Access offers more than 2,000 ebook titles now available from publishers such as University of California Press, Cornell University Press, NYU Press, and University of Michigan Press, and JSTOR will continue to add new titles. In addition, all journal content in JSTOR published prior to 1923 in the United States and prior to 1870 elsewhere is freely available to anyone, anywhere in the world. These open access books and archival journals are freely available for anyone in the world to use.
Permitted Use | Open Access
Literature online : the home of literature and criticism
Literature Online offers full text access to rare and inacessible works, up-to-date, reference resources, in addition to the full text of poetry, drama, and prose fiction from the seventh century to the present day. Materials are included from almost every period and genre of English literature as well as many works by 20th century authors. Contemporary criticism is available through the Annual Bibliography of English Language and Literature (ABELL).
Making of the Modern World: Part I & Part II (Gale)
"The Making of the Modern World is an extraordinary series which covers the history of Western trade, encompassing the coal, iron, and steel industries, the railway industry, the cotton industry, banking and finance, and the emergence of the modern corporation."UNB Libraries provides access to:Part I, The Goldsmiths'-Kress Collection, 1450-1850Offers ways of understanding the expansion of world trade, the Industrial Revolution, and the development of modern capitalism, supporting research in variety of disciplines. Users have access to an abundance of rare books and primary source materials, many of which are the only known copy of the work.Part II, 1851-1914Takes The Making of the Modern World series to the end of the nineteenth century. Comprised mainly of primary source documents such as monographs, reports, correspondence, speeches, and surveys, this collection broadens Gale’s international coverage of social, economic, and business history, as well as political science, technology, industrialisation, and the birth of the modern corporation."
Purchased multi-user unlimited access
Market share reporter (Gale)
Market Share Reporter (MSR) is a compilation of published market share data about companies, brands, products, commodities, services and facilities in U.S. and international markets. The 2016 and every 2nd year's subsequent edition online edition are available through Gale Virtual Reference Library. Data is compiled from periodical sources (newspapers, magazines, newsletters, government reports etc.) over the previous three to four years. Entries feature a descriptive title; data and market description; a list of producers/products; original sources are also provided. The main method used to store entries in MSR is by name of the report; reports can be found by keyword or by using the Advanced Search feature.
2016- [Every 2nd year's edition]
Nineteenth century collections online (NCCO)
Nineteenth Century Collections Online is a digitization and publishing program focusing on primary source collections of the long nineteenth century. The program includes a variety of content types--monographs, newspapers, pamphlets, manuscripts, ephemera, maps, statistics, and more--and unites them in one central, cross-searchable location. 12 collections are now available:

Individual titles in these collections are available for discovery in our eBooks search or in UNBWorldCat:
• Asia and the West: Diplomacy and Cultural Exchange
• British Politics and Society
• British Theatre, Music, and Literature: High and Popular Culture
• Children's Literature and Childhood
• European Literature, 1790-1840: The Corvey Collection
• Mapping the World: Maps and Travel Literature
• Religion, Society, Spirituality, and Reform
• Science, Technology, and Medicine: 1780-1925, Part II

Individual titles in these collections can only be discoverd in the NCCO site:
• Europe and Africa: Commerce, Christianity, Civilization, and Conquest
• Photography: The World through the Lens
• Science, Technology, and Medicine: 1780-1925, Part I
• Women: Transnational Networks
Oxford University Press Journals
Oxford Journals is a division of Oxford University Press, which is a department of Oxford University. We publish well over 230 academic and research journals covering a broad range of subject areas, two-thirds of which are published in collaboration with learned societies and other international organizations.
Past Masters (Intelex)
InteLex Past Masters is comprised of 100+ full-text humanities and sciences databases that make available cohesive collections of editions, in both original language and in English translation, of seminal figures in the humanities and sciences.
ProQuest historical newspapers
ProQuest Historical Newspapers offers full-text and full-image articles for newspapers dating back to the 19th century. As part of the ProQuest Historical Newspapers program, every issue of each title includes the complete paper, cover-to-cover, with full-page and article images in downloadable PDF. Includes The New York Times (1851-2007), The Wall Street Journal (1889-1993), and Washington Post (1877-1994).
ProQuest Historical Newspapers: The Globe and Mail
Canada's Heritage from 1844 contains complete coverage of The Globe and Mail newspaper from 1844 through 2011. Coverage includes major events in Canadian history, images, advertisements, classifieds, cartoons, birth/death notices and the full content of the Report on Business section first published in 1962.
SAGE Journals Online
"SAGE Publications is an independent international publisher of journals, books, and electronic media. Since its inception in 1965, SAGE Publications has been a leader in publishing high-caliber titles for academic researchers in the social sciences."
Permitted Use | Subscribed multi-user unlimited access
Science Direct (Elsevier)
Science Direct offers comprehensive coverage of literature across all fields of science, medicine and technology. All previous ScienceDirect journal collections have been merged into this single collection, along with select purchased eBook titles.
Permitted Use | Subscribed multi-user unlimited access
Scopus
Scopus, a multidisciplinary online resource, will be invaluable to students and faculty in various fields of study within the sciences, health sciences and the social sciences. Scopus offers full-text linking, abstracting-and-indexing information including peer-reviewed titles from international publishers, Open Access journals, conference proceedings, trade publications, quality web sources.
SpringerLink
SpringerLINK service provides access to electronic journals in a variety of subjects, including "life sciences, chemical sciences, geosciences, computer science, mathematics, medicine, physics & astronomy, engineering, environmental sciences, law, and economics."
[NOTE: pre-1996 Archival content now accessible when available]
Taylor & Francis Online - eJournals

"Taylor & Francis Group collaborates with researchers, scholarly societies, universities and libraries worldwide to bring knowledge to life. Our journals program encompasses over 1,600 titles and as one of the world’s leading publishers of scholarly journals our content spans all areas of Humanities, Social Sciences, Science and Technology."

Purchased & Subscribed multi-user unlimited access (varies by title)
Times Digital Archives (Gale)
The Times Digital Archive allows users to search and view online The Times (London) newspaper from 1785-1985.
NOTE: The Times is not published on Sunday, and the The Sunday Times, a distinct newspaper, is not included in this database.
War on Poverty and Office of Economic Opportunity: Part III Administration of Antipoverty Programs & Civil Rights, 1964-67 (Gale)
"This collection brings together a series of Office of Economic Opportunity (OEO) collections that highlight efforts to meld the issue of civil rights and antipoverty initiatives:1) Alphabetical File of Samuel Yette, 1964-1966: Yette was the Special Assistant to the Director of Civil Rights. Among his records are correspondence, reports, antipoverty program analyses, minutes of meetings, transcripts of testimonies, and other material.2) Program Files, 1964-1967: These records consist of correspondence, weekly reports on civil rights matters, reports by civil rights coordinators, equal employment opportunity guidelines, and more.3) Records Relating to the Administration of the Civil Rights Program in the Regions, 1965-1966: These records arranged by region > state > local areas and cities consist of correspondence between regional coordinators, various civil rights groups, labor organizations, members of Congress, and community groups regarding the activities of the OEO."Original Microform Title: The War on Poverty and the Office of Economic Opportunity; Part 3: Administration of Antipoverty Programs and Civil Rights, 1964-1967
Wiley Online Library
Wiley Online Library hosts the world's broadest and deepest multidisciplinary collection of online resources covering life, health and physical sciences, social science, and the humanities. It delivers seamless integrated access to over 4 million articles from 1500 journals. UNB also subscribes to select eBook titles.

Digitized Newspapers

Likewise, UNB Libraries makes available for text and data mining, with some conditions, digital back files of New Brunswick's big three daily newspapers, The Telegraph Journal, The Moncton Times-Transcript, and The Daily Gleaner. For more information, please contact James MacKenzie, Director, Advanced Digital Research and Scholarship (jmackenz@unb.ca).

Free Resources

Copyright Considerations

Frequently Asked Questions

What is the status of Text and Data Mining activities under the current Canadian Copyright regime?

The Canadian Copyright Act does not address text and data mining. The federal government has signalled that they intend to consider changes to the Act for this type of research, but there is no clear regulation at this point.
For more information see ‘A Consultation on a Modern Copyright Framework for Artificial Intelligence and the Internet of Things’: https://www.ic.gc.ca/eic/site/693.nsf/eng/00316.html

What are the applications and limits of the fair dealing doctrine?

Canadian courts have not provided sufficient direction for researchers to confidently use the fair dealing exception to allow for broad copying of entire protected works for TDM. It should be noted that this is contrary to rights codified in the U.S. fair use where copying for TDM has been successfully used as a defence for researchers in that jurisdiction. Other jurisdictions, such as Japan and the EU, have introduced exceptions specific to TDM. Canadians should expect direction from the government on this issue but should seek the permission of the copyright owners for any mass copying for TDM research.

Can I harvest publicly available data - Twitter feeds, facebook comments, news sites, etc. for analysis?

There is an important distinction between facts and data and copyright protected works. Copyright rules do not apply to raw facts and data, but do apply to the original expression of the data in, for example, the form of written discussion, charts, graphs, etc. However, publicly available data and works are generally protected by a license or ‘terms of use’ that will stipulate how information on a website can be used. Unless there is language in the license that permits the type of copying necessary to harvest the information, permission from the owner is required.

How do I obtain permissions to harvest a corpus of texts, and are there different licensing and access models?

Looking into the rights or permissions needed to harvest a corpus of texts, or to use data, should be the first consideration when they are needed for a research project. You are taking an avoidable risk if you don’t request rights and permissions until after the research is complete, and you may be disappointed if permissions are denied. It is important to have as much information as possible on what material is necessary, how it will be used, if it will be used in partnered research, if you will transform or build upon it, where it will be stored, etc. Permission can be as simple as an email or as complex as a licensing agreement. If the ‘terms of use’, or a license such as the Creative Commons, the Community Data License agreement, etc. are specified, you will need to make sure your intended use aligns with what is allowed under the specified license or ‘terms of use’. If no license is specified, you cannot use, share, distribute or change the material without obtaining permission or a license from the owner. Alternatively, if your intended use is not allowed under the license agreement that was specified, you will need to ask the owner’s permission.

Can anyone at UNB Libraries help me navigate the legal landscape on this and reach out to implicated publishers?

For more information, contact Josh Dickison, Copyright Officer and Manager of Digital Delivery at UNB Libraries: copyright@unb.ca
UNB Libraries may forward you to, or ask you to contact, the Office of Research Services at ors@unb.ca if a research license agreement is needed, or for assistance on different licenses and their allowed uses.

Erik Moore (He/him)
I am on sabbatical until July 2026.
UNB Fredericton
ecmoore@unb.ca

Julie Morris (They/Them)
Collections Analysis/Bibliometrics Lib
UNB Fredericton
julie.morris@unb.ca
(506)-447-3220

Marc Bragdon
Head, Harriet Irving Research Commons
UNB Fredericton
mbragdon@unb.ca
(506)-458-7741

Last modified on November 6, 2023 15:42