Cybernetic Musings: Text mining

Sunday, October 10, 2004

Text mining

Text mining, also known as intelligent text analysis, text data mining or knowledge-discovery in text (KDT), refers generally to the process of extracting interesting and non-trivial information and knowledge from unstructured text. Text mining is a young interdisciplinary field which draws on information retrieval, data mining, machine learning, statistics and computational linguistics. As most information (over 80%) is stored as text, text mining is believed to have a high commercial potential value.

One application of text mining is in bioinformatics, where details of experimental results can be automatically extracted from a large corpus of text and then processed computationally. For example it has been quoted that a support vector machine (SVM) with appropriate training can extract details of protein-protein interaction from the literature with greater than 90 percent accuracy.

Some bioinformaticians have termed the body of literature the textome, which derives its name from the same naming convention which gave us the genome, however this term is far from universal.

One of the largest text mining applications that exist is probably the classified ECHELON surveillance system.

--g//

Cybernetic Musings

Sunday, October 10, 2004

Text mining

No comments:

Webstats