Text mining can be broadly defined as a knowledge-intensive process in which a user interacts with a document collection over time by using a suite of analysis tools from data sources. More specifically, text mining is the extraction of useful information from a collection of documents similar to data mining processes. And it seeks to extract useful information from the data sources through the identification and exploration of interesting patterns. Furthermore, it involves the preprocessing of document collections (text categorization, information extraction, term extraction), the storage of the intermediate representations, the techniques to analyze these intermediate representations (such as distribution analysis, clustering, trend analysis, and association rules), and visualization of the results.
The objective of text mining is to exploit hidden information contained in textual documents in various ways and it is the discovery of valuable knowledge in text documents.
Text mining is a new and exciting research area that tries to solve the information overload problem by using techniques from data mining, machine learning, natural language processing (NLP), information extraction and knowledge management.