Thematic Corpus Construction, Representativeness, and Discursive Sustainability

New open access publication by Jan Buts in Traducción y Sostenibilidad Cultural II: Retos y nuevos escenarios

 

DOI: https://doi.org/10.14201/0AQ0373493500

 

Abstract

This paper discusses two corpora, the Genealogies of Knowledge Corpus and the Sustainability and Health Corpus. Both are constructed on the basis of topics and concepts, rather than factors such as genre or register. Selection criteria influence the kind of analyses one can apply to a set of language data, and the sort of conclusions one can draw from those analyses. Particularly relevant, in this respect, is the question of representativeness: how do we know that research results are meaningful beyond the data at hand? The contrast between ideal and pragmatic answers to this question is addressed in the paper’s second section. The final part offers reflections on a recent complication. The production of text is increasingly relegated to automated systems. What does this mean for research principles based upon the assumption that the analysis of discourse can tell us something about the social world beyond the text?