Department of Computer Science

Text Analytics

Text Analytics refers to a set of linguistic, statistical and machine learning techniques that extract usable knowledge from unstructured text data through identification of core concepts, sentiments and trends etc., and then facilitate using this knowledge to support decision-making. Text analytics is not the same as search. Unlike search, it is a “bottom up” approach that does not require users to know particular search terms. Text analytics, instead, reveals the concepts and themes contained in a body of documents, and then map the relationships between them. The Wikipedia describes the term text analytics as “a set of linguistic, statistical, and machine learning techniques that model and structure the information content of textual sources for business intelligence, exploratory data analysis, research, or investigation”.


It’s no secret that the world has seen an explosion of information in the past 20 years, an explosion that experts predict will continue as the billions of people who use online resources continue to expand their usage, and the billions of people who do not yet have access to Internet gain it. The new transformed participative Web also allows users to become co-creators of the content, rather than merely being consumers. Text constitutes the largest part of the Web content. Besides the Web, the information and communication technology has resulted in another major transformation, the way businesses are carried out. Business organizations have now access to more and more data. However, it is now a well-accepted Industry estimate that only 20% of the total data available to companies is structured and usable for decision making. A major part (about 80%) of the data is unstructured and hence not amenable directly for decisions. The structured data (can be referred as data) can only tell what happened (such as bad sale performance a product) but not why it happened. This information can be obtained only by content analysis of the unstructured data (can be referred as text). Business organizations are now realizing that the what (data) needs to be associated with why (textual content) in order to be productive and competitive. Things like ‘voice of customer’, ‘social media monitoring’, ‘survey analysis’, ‘voice of employee’, and ‘e-discovery’ are some of the popular text analytics formulations used by business organizations. Text analytics is being deployed in the area of public policy and national security as well.


A text document might be a scholarly journal article, an eBook chapter, an email text, a blog post written by a user, contents of a news feed, a free text response to a market research survey, a movie review, a policy document of a business organization or even a crime scene report. There is a growing recognition that these and many other forms of text documents contain extremely valuable information useful for a variety of purposes. It is this motivation that has attracted attention of researchers and professionals into the interesting and challenging area of text analytics. Analyzing text has become essential in various types of scientific research, business applications and other areas. It adds significant value to other forms of data analysis, particularly when used to predict how people may act in certain situations. For example, in obtaining a well-rounded view of customer behavior, text analytics is critical because it provides insight into the nuances of attitudes and opinions that influence behavior. With the exponential growth of text in online formats, ways must be found to structure this information and make it available to researchers and decision-makers.The essence of text analytics is to take very large unstructured text documents and extract useful intelligence. Text documents are structured for reading by people, but they are unstructured as far as data extraction or reading by a machine is concerned. Therefore, a sophisticated set of algorithms and techniques, which draw building blocks from traditional machine learning, language processing and statistical methods, are required. Text documents being ‘the data’ of today, we need to develop techniques and formulations that can meet the desired objectives. Some of the popular sub-tasks in text analytics are: information extraction, semantic annotation, classification and clustering, summarization, sentiment analysis and event analysis. Ability to handle multi-lingual texts is another important requirement from text analytics formulations. Our focus is on contributing to some of these goals in text analytics and design useful applications out of that work.


More information about the text analytics research is available at:

Area contact:

Vivek Singh

^ Back to Top