RU Nijmegen logo

TM4IP

Text Mining
for
Intellectual Property

PHASAR logo

The TM4IP project is collaboration between the Radboud University Nijmegen (the Netherlands) and Matrixware (Austria). The goals of the TM4IP project are twofold:
  • to develop linguistic resources for the fast and accurate dependency analysis of complicated english-language texts, to be made available in the public domain for large-scale Information Retrieval applications as part of the AGFL project
  • to develop new professional tools for Text Mining in the Intellectual Property (patent search) area:
    • the PHASAR professional search engine
    • the Linguistic Classification System LCS.

About Text Mining

According to Marti Hearst, Text Mining is "the discovery by computer of new, previously unknown information, by automatically extracting information from different written resources. A key element is the linking together of the extracted information together to form new facts or new hypotheses to be explored further by more conventional means of experimentation."

Typically, Text Mining consists of a search phase, in which from a very large collection documents pertaining to a certain topic are sought, and an analysis phase, in which (parts of) those documents are presentened in such a way as to make it easy for the human user to interpret them and obtain knowledge.

PHASAR and IP

The PHASAR system provides its users with a wholly new way of searching, using linguistically motivated search terms, giving the user tight control over precision and recall (avoiding long lists of spurious hits) and providing unprecedented support of the search process by information from the index and the thesauri. These properties make it wellsuited for both exploratory and exhaustive search in large collections of patent documents.

For the analysis phase PHASAR provides passage retrieval (focussing on relevant sentences) and re-usable search profiles.