Translational English Corpus (TEC)

The Translational English Corpus (TEC) is a corpus of contemporary translational English: it consists of written texts translated into English from a variety of source languages, European and non-European. It was set up and is currently managed by Professor Mona Baker at the Centre for Translation and Intercultural Studies. The custom-made software for processing the corpus, which is downloadable from the web, is designed by Dr. Saturnino Luz, University of Edinburgh, who is also in charge of maintaining the corpus.

What type of research does TEC support?

TEC has supported a broad range of studies in two main areas: the way in which the patterning of translated text might be different from that of non-translated text in the same language, and stylistic variation across individual translators.

What does TEC consist of?

TEC consists of four subcorpora: fiction, biography, news and inflight magazines. The overall size of the corpus is currently around ten million words. It can be accessed freely via the web, using a custom-built concordancer designed by Dr. Saturnino Luz.

TEC – Contents

TEC is meticulously documented in terms of extralinguistic features such as gender, nationality and occupation of the translator, direction of translation, source language, publisher of the translated text, etc. This information is held in a separate header file for each text.

TEC – Sample Header File

The concordancing software is designed to make the information in the header file available to the researcher at a glance.

TEC tree - democracy

Software tools

TECTo access the new and improved version of the TEC concordancing tool, open the following URL in your browser to start the download: https://sourceforge.net/projects/modnlp/files/modnlp-teccli-0.8.5-bin-tec.zip/download

Unzip the files you downloaded (do  not skip this step). Windows users should right-click on the folder and select ‘Extract All’; Mac users may find this has been done automatically.

Open the unzipped folder (it should be called modnlp-teccli-0.8.5-bin-tec). Windows user should then right-click on the file named teccli.jar and select ‘Open with…’, choosing the default option (usually ‘Jar Launcher’). Finally, click Open.

Mac users should hold the CTRL key whilst clicking on the file named teccli.jar, select ‘Open with…’,and  choose the default option (usually ‘Java Platform SE Binary’ or ‘Jar Launcher’), Finally, click Open.

In subsequent sessions it should be possible to launch the application simply by double clicking on teccli.jar from within this folder.

Alternatively, TEC can be accessed via the Genealogies project interface. Having downloaded and launched the corpus browser, go to ‘File’->’New remote corpus…’ and enter genealogies.mvm.ed.av.uk:1240  as the IP address of the new corpus server.

If you find any bugs in the software, please report them at https://sourceforge.net/p/modnlp/tickets/ (click on ‘Create ticket’ on the menu on the left).

The new functionality of the TEC tool is described in in our recent paper: Luz, S., Sheehan, S. (2020) ‘Methods and visualization tools for the analysis of medical, political and scientific concepts in Genealogies of Knowledge’, Palgrave Communications 6(49). https://doi.org/10.1057/s41599-020-0423-6

You may also find it helpful to consult the Genealogies of Knowledge software manual at http://genealogiesofknowledge.net/software/manual/