The corpus analysis software under development for this project is available for anyone to use. This software connects a modnlp-based concordance browser to the most recent update of the Genealogies corpus.
To download and install a copy of the tool, please click one of the following links:
Windows users: To complete the installation process, you will need to unzip the folder once downloaded and run the installation by double clicking on the .exe file.
Once the tool has been installed, it should be possible to launch the application in subsequent sessions simply by finding modNLP among the list of programmes installed on your machine.
Mac users: To install and run ModNLP on Mac you will first need to modify your security permissions. Please note that this may not be possible without admin permissions on your computer.
1) Click on the button above to download the installer and extract the .app file.
2) Open a Terminal window. To do this you can press cmd+space to open Spotlight and then type Terminal
3) Copy or type out the following command and hit enter. You may be required to type in
sudo spctl –master-disable
4) Next, copy or type out the following command. DO NOT HIT ENTER YET. Drag the modNLP
installer icon into terminal window (alternatively type out the path to the installer at the end of the command)
sudo chmod -R 755
sudo chmod -R 755 /Users/me/Downloads/modNLP.app
5) The software should now install if you double click on the ModNLP.app
6) Pin this to your launcher for easy access in subsequent sessions.
Additionally, the software code and plugins are available for download at : https://sourceforge.net/projects/modnlp/
Should you encounter any software bugs or other technical problems when using these tools, please create a ticket detailing the nature of the issue on our SourceForge project page: https://sourceforge.net/p/modnlp/tickets/
MODNLP: Modular Suite of NLP Tools
modnlp aims to provide a modular architecture and tools for natural language processing written (mainly) in Java. These tools are being developed in connection with the Genealogies of Knowledge project.
The following modnlp modules are currently available:
- idx: an API and tools for (inverted) indexing, storage and retrieval of large amounts of text, with (XML-based) handling of meta-data.
- tc: an API and tools for text categorisation, including, functionality for XML parsing, term set reduction (and basic keyword extraction), probabilistic classifier induction, two sample classification tools, and evaluation modules.
- tec-tools (v2), consisting of tec-server, a corpus indexer and server for corpus access and analysis over the web and tec-client: a corpus analysis client. Unlike the (now obsolete) version 1 of these tools, originally developed for the TEC project, and written in Perl, C (server side) and Java, the version in this site (v2) is written entirely in Java.
This new version of the tools forms the basis of software support for text analysis and visualisation in the Genealogies of Knowledge project.
The modnlp/tec tools have also been used by the European Parliamentary Comparable and Parallel Corpora project (ECPC) coordinated by Dr. Calzada Pérez (Universitat Jaume I, Spain), and by the Translational English Corpus, which has been collected and maintained under Prof Mona Baker’s supervision at the University of Manchester, and made available on the Internet through the Genealogies of Knowledge project website, in a collaboration between The University of Edinburgh and The University of Manchester.
Also available is the documentation of the modnlp suite (for developers).