tc

MODNLP/TC: an API and tools for text categorisation

modnlp/tc: an API and tools for text categorisation, including, functionality for XML parsing, term set reduction (and basic keyword extraction), probabilistic classifier induction, two sample classification tools, and evaluation modules. The software is distributed under the GNU General Public License, and is fully compatible with the GNU Classpath It has been tested on a number of JVM’s, including kaffe (v1.1.5), sablevm (v1.1.6), jamvm (v1.3) and JDK 1.4+ The functionality supported by the API include:

See the API documentation for more details.
Download the latest version of modnlp/tc. And have “fun”.

See also the Developer’s web page at Sourceforge.net for the GIT repository