Genealogies of Knowledge Corpus

The Genealogies of Knowledge corpus is designed to enable researchers to trace the trajectory of key concepts as they enter different cultural and temporal spaces, predominantly but not exclusively through the mediation of various forms of translation.

The current focus of the project is on three historical lingua francas (Arabic, Latin and English) and on concepts relating to the body politic and to scientific, expert discourse. However, the research team is developing a set of resources and a range of methodologies that can support future studies involving different lingua francas (French being an obvious choice in the European context), different historical moments (perhaps the Enlightenment), and different constellations of concepts. To this end, both the corpora we are building and the software being developed to interrogate them and visualise the findings are being made accessible to the research community, to support other types of study.

Because of legal constraints pertaining to copyright law, we offer restricted access to the corpora: we aim to allow visitors to the site to run searches, expand individual concordance lines within the limits of fair use (300 words), and download the findings. But we are unable to offer full access to individual texts.

All software tools developed for this project are available for free download, together with relevant documentation, under a Free Software license, the GNU General Public License.


Content of the Corpus

Temporally, the corpus is designed to allow the research team to examine the following processes, within specific historical and spatial locations:

  • the mediation of Greek thought through translations into and commentaries in Arabic, from the eighth through to the tenth century;
  • its renegotiation via translations into and commentaries in Latin, either directly or via Arabic in the eleventh, twelfth and thirteenth centuries;
  • the renegotiation of the key concepts under study in translations of key texts into English in the late nineteenth century and throughout the twentieth and early twenty-first centuries;
  • the ongoing renegotiation of the concepts in question by civil society organisations and actors in the twenty-first century, particularly on the Internet.

The corpus is broadly divided into three sections, or sub corpora:

Premodern subcorpus:

  • a corpus of Greek source texts;
  • translations into medieval Arabic;
  • medieval Arabic commentaries on Greek texts;
  • translations and retranslations into Latin, from both Arabic and Greek;
  • Latin commentaries on Greek texts.

Modern sub corpus:

  • translations and retranslations of relevant texts into English in the nineteenth and throughout the twentieth and twenty-first centuries, primarily from Greek, Latin, French and German;

Internet subcorpus:

  • Internet discourse in English produced by alternative media and news outlets, such as Indymedia, Inter Press Service, Open Democracy and ROAR (Reflections on a Revolution) Magazine, as well as civil society organisations such as The World Social Forum. This corpus draws on discourses generated and disseminated by communities that advocate and practise alternative forms of political participation and provide new platforms for the collective revision and construction of knowledge.

Supported by the powerful search and visualisation software tools developed specifically for this project, the corpus is designed to allow the research team, and the research community at large, to trace the development and mutation of key concepts that have become a core part of our academic and public life, and their contestation and renegotiation by civil society today.


Using the corpus

Guidance on how to use the corpus can be accessed here:

Information about the content of our corpora can be accessed here:


Corpus text preparation

Documentation on the process of preparing texts for uploading to the corpus can be downloaded by clicking on the links below:

To view the latest version of the .dtd files used to annotate the corpus texts, please click on the following links: