modnlp-tec: on-line Translational English Corpus

Start The TEC concordancing tool by clicking on the icon below



The TEC corpus browser uses JavaTM Web Start technology. If the tool fails to start when you click on the icon above, you might have to download or upgrade Java on your machine. See the Java Web Site for details.

N.B.: If the link above fails to start, try downloading the latest version of modnlp-teccli (this link), uncompressing it, and running teccli.jar (by clicking on it, for instance). Once the application starts, you will see a dialogue similar to the one shown on the right. choose index Select 'Choose new remote corpus' and enter genealogiesofknowledge.net:1240 into the window that will pop up. After a few seconds, the concordancer should appear. If you are behind a firewall, set the proxy server by selecting "Options" on the menu bar.

The design, structure and motivations for the TEC/ECPC tools are described in the following paper:

If you use modnlp or the TEC/ECPC tools in your research, please consider citing it.

Quick tutorial

In addition to the standard concordancer, the browser implements a few other tools. It allows you to restrict your search to sub-corpora defined according to certain features, to display a summary of the files contained in the TEC corpus. and to display frequency lists for the various selectable sub-corpora.

Selecting sub-corpora

The sub-corpora selection tool allows you to restrict the results of concordancing queries and the contents of frequency tables to sections of files matching certain selection criteria. These criteria can be, for example, author, translator, translator gender, source language, translator nationality, etc. In order to select a sub-corpus, choose "Options->Select sub-corpus...".

A window similar to the one shown below should appear.

sub-corpus selection

The menu boxes allow you to select one or more items describing texts to be include in the desired sub-corpus. The menu boxes can be connected so as to form the logical expressions which ultimately determine what gets included or excluded. The 'exclude' checkbox below the menu boxes cause the items selected in the box above it to be excluded.

Clicking 'OK' activates the sub-corpus selection. In order to de-activate it (that is, allow search on the full corpus), choose "Options" and de-select "Activate sub-corpus".

Displaying a frequency list

Select "Plugins->Word Frequency List". The following window will appear.

Frequency list

Select the range of ranked items to display (default is display the 500 most common terms) and click on "Get List" to retrieve their frequency table. This table can be saved to a CSV file which you can, if you like, manipulate through a spreadsheet software.

Displaying general corpus information

Select "Plugins->Corpus Description Browser". A window will appear which contains a list of each file in the corpus, the major sub-corpus they belong to (i.e. fiction, newspapers, biography, and in-flight magazines), the number of tokens they contain and their type-token ratios. At the bottom of the window you will see the total number of tokens in the corpus and the overall type-token ratio.