Designer: Wiktor Sadowski (Polish, 1956-). Warszawska Jesien Poezji (Warsaw Autumn of Poetry). Unknown. 97.16 x 66.04 cm (38.25 x 26 inches). Repository: RISD Graphic Design and Illustration Archive (Providence, Rhode Island, USA).

Text analysis is behind the auto-suggest on your phone, the spam filter on your email, and the suggestions in your streaming services. Its practical use in our everyday lives has been quietly impactful for many years, but recently there has been serious buzz around text analysis and generative artificial intelligence, including their potential effects on human society. If you aren’t aware of what text analysis is and how it can benefit your academic and professional life, keep reading.

If you’ve never used textual data in your research, it might not be obvious why text analysis matters. What can text analysis offer that traditional research methods do not?

In this post, we explore that question and explain some fundamental concepts of text analysis. We also introduce resources for growing your research toolkit, including Constellate and the Text Analysis Pedagogy (TAP) Institute.

What is text analysis?

Text analysis (or text analytics or text mining) is the process of using technology to help analyze un- and semi-structured text data for valuable insights, trends, and patterns. It is particularly valuable in instances where there is a need to process large volumes of text-based data that would otherwise be too resource and time intensive to be analyzed manually.

​For researchers, the primary advantage that text analysis offers is an ability to consider knowledge at non-human scales (both very big and very small). Text analysis can enable us to consider, for example, hundreds of different features within a million books, revealing patterns inaccessible to a human reader, whether those aspects are imperceptibly small, diffused across centuries, or simply within records that have never been read.

For businesses that have significant data, the primary advantage that text analysis offers is an ability to drive (and in some cases, automate) key business decisions based on data insights from large-scale, unstructured text. Businesses have long-relied on insights from numerical data for decision-making, yet they often overlook the bulk of their data: text that sits dormant and unanalyzed, in part because no one on staff possesses the skills to unlock its potential.

Why learn text analysis?

Today it is clear that digital text has become the primary publishing and archival format for humankind. The ability to search and manipulate our digital records is now essential for researchers to find important information. Historians of the future looking at our present moment will need to be able to analyze emails, text messages, websites, social media, and other digital files. How well can we really understand history and society if we are not prepared to search, read, and interpret digital records in the form of primary sources?

Employers are establishing data literacy as an essential competency. Only 43 percent of today’s learners consider themselves data literate, and more than half lack familiarity with the concept altogether. Universities everywhere are looking to hire and up-skill their staff to meet the educational need.

All scholars, including humanities scholars, need a flexible skill set that prepares them for working inside and outside the academy. ​Even if you don’t use text analysis for your own research, it is important to understand a little about how it works because text analysis already drives the way decisions are made in research, in business, and in government. Text analysis and machine algorithms are deciding what webpages you see, who gets a loan from the bank, and how politicians make policy decisions. The issues surrounding text analysis are humanist issues, not merely technical, but social, ethical, and legal.

Additionally, the humanities have a valuable research role to play in the era of data science, big data, and machine learning, especially confronting social issues like algorithmic oppression, data privacy, and social media manipulation.

— Amy Kirchhoff, Senior Manager, Constellate

How do you learn text analysis?

To become truly proficient, you have to learn a programming language like Python or R. The good news is that programming and text analysis, like any skill, can be learned. That’s where the Text Analysis Pedagogy (TAP) Institute comes in. First offered in 2021, the annual TAP Institute seeks to address the need for greater community support, technical infrastructure, and open educational resources by offering a free series of events and classes for anyone interested in teaching text analysis.

TAP Institute works in partnership with Constellate, part of ITHAKA’s portfolio of nonprofit services aligned around a shared mission to improve access to knowledge. Constellate is the only text analysis platform that integrates access to scholarly content and open educational resources into a cloud-based lab to help faculty more easily and effectively teach text analysis and data skills. With Constellate, learners across all disciplines can apply text analysis methods to datasets, and hone their skills with support from on-demand tutorials, live classes taught by experts, and engagement with an inspiring user community.

TAP Institute courses are taught using Constellate and are designed to be progressive, so you will benefit from taking a single course or the entire series, no matter your skill level. Taught by leading text-analysis experts, these free courses are designed as open educational resources that you can use, remix, and tailor for teaching at your own institution.

Constellate also offers Python intensives every quarter, and asynchronous video courses on demand. Visit our site and our classes and events page for details. Please reach out to sign up for a Constellate trial or contact us with any questions you may have about text analysis, Constellate, or the TAP Institute.