The corpus analysis software under development for this project is available for anyone to use. This software connects a modnlp-based concordance browser to the most recent update of the Genealogies corpus.
To download a copy of the tool, please follow the following instructions (see * Note below):
- Click on the following link to start download from our SourceForge page: https://sourceforge.net/projects/modnlp/files/modnlp-teccli-0.8.5-bin-gok.zip/download
- Unzip this folder: Windows users should right-click on the folder and select Extract All; Mac users may find this has been done automatically
- Open the unzipped folder (it should be called modnlp-teccli-0.8.5-bin-gok)
- Windows users should then right-click on the file named teccli.jar, select ‘Open with…’, choose the default option (usually ‘Jar Launcher’), and then click Open
- Mac users should instead hold the CTRL key whilst clicking on the file named teccli.jar, select ‘Open with…’, choose the default option (usually ‘Java Platform SE Binary’ or ‘Jar Launcher’), and then click Open
- In subsequent sessions it should be possible to launch the application simply by double clicking on teccli.jar from within this folder.
If you have not already done so, we recommend that you install the latest version of Java.
Additionally, the software code and plugins are available for download at : https://sourceforge.net/projects/modnlp/
Given that Modnlp and these plugins are still under construction and as such are frequently updated, it is recommended that users regularly delete all existing versions of the software from their workstation and re-download the browser from this page. This will ensure that you continue to work with the latest version of the software available as the project evolves.
Should you encounter any software bugs or other technical problems when using these tools, please create a ticket detailing the nature of the issue on our SourceForge project page: https://sourceforge.net/p/modnlp/tickets/
MODNLP: Modular Suite of NLP Tools
modnlp aims to provide a modular architecture and tools for natural language processing written (mainly) in Java. These tools are being developed in connection with the Genealogies of Knowledge project.
The following modnlp modules are currently available:
- idx: an API and tools for (inverted) indexing, storage and retrieval of large amounts of text, with (XML-based) handling of meta-data.
- tc: an API and tools for text categorisation, including, functionality for XML parsing, term set reduction (and basic keyword extraction), probabilistic classifier induction, two sample classification tools, and evaluation modules.
- tec-tools (v2), consisting of tec-server, a corpus indexer and server for corpus access and analysis over the web and tec-client: a corpus analysis client. Unlike the (now obsolete) version 1 of these tools, originally developed for the TEC project, and written in Perl, C (server side) and Java, the version in this site (v2) is written entirely in Java.
This new version of the tools forms the basis of software support for text analysis and visualisation in the Genealogies of Knowledge project.
The modnlp/tec tools have also been used by the European Parliamentary Comparable and Parallel Corpora project (ECPC) coordinated by Dr. Calzada Pérez (Universitat Jaume I, Spain), and by the Translational English Corpus, which has been collected and maintained under Prof Mona Baker’s supervision at the University of Manchester, and made available on the Internet through the Genealogies of Knowledge project website, in a collaboration between The University of Edinburgh and The University of Manchester.
Also available is the documentation of the modnlp suite (for developers).
Regular users of the GoK corpora will have noticed that the way you start the GoK corpus browser has changed. From May 2020 onwards, users must download the tool and run it directly from their computer, rather than access it through the web site (Java WebStart). We changed the way the GoK tool is run to avoid having to purchase a “code signing certificate”, which would create a dependency between the project and an external certification agency. Furthermore, as the non-free software (non-open-source) implementation of Java WebStart most widely used now blocks self-signed applications from running, we felt that moving to this different mode of delivery of the GoK tool would be more consistent with the Free/Libre Software ethos of the Genealogies project, while allowing us greater flexibility in developing and deploying our software.