Corpus-linguistic Approaches to Scientific Discourse


Date: October 10, 2024
Time: 3:00 PM – 5:45 PM

Venue: Online

Science, text, and corpora

The progress of science has for centuries depended on the textual transmission of knowledge. Indeed, knowledge needs to be communicated in order to be retained and refined, and it is quite unthinkable to fully separate the science from the way it is contextualized, narrated, documented, or translated. Yet the exact manner in which language both influences and reflects scientific practice remains understudied.

In recent years, corpus-linguistic approaches have started to provide systematic insights into the communicative aspects of knowledge formation and contestation, but much work remains to be done. In a collaboration between the Genealogies of Knowledge Research Network and the Centre for Sustainable Healthcare Education (University of Oslo), we are working on a thematic corpus focused on the discourse of sustainability and healthcare (SHE corpus). We are organizing this event to exchange ideas and experiences regarding the construction and analysis of scientific corpora, and invite everyone with similar interests to join us at this event.

The online event will consist of four invited talks, divided into two sessions. Each session consists of two presentations followed by a moderated discussion between speakers, during which questions from the audience can be addressed.

Programme

Time
Oslo CEST (UTC+2)
Content Speaker
15:00 Welcome and introduction Jan Buts
15:10 Disinformation in Academia: a corpus-based exploration of linguistic patterns in COVID-19 treatment research Tony Berber Sardinha and Paula Tavares Pinto
15:30 Unexpected discourse in the Coruña Corpus of English Scientific Writing: Life Sciences as a case in point Isabel Moskowich
15:50 Questions and discussion
16:10 Break
16:30 Studying medical discourse in online discussion forums: methodological opportunities and challenges for corpus-based approaches Henry Jones
16:50 Late Modern English microbiology texts in the Royal Society Corpus Katrin Menzel
17:10 Questions and discussion
17:30 Closing remarks Gabriela Saldanha

Abstracts

Disinformation in Academia: a corpus-based exploration of linguistic patterns in COVID-19 treatment research

Tony Berber-Sardinha (São Paulo Catholic University), Paula Tavares Pinto (São Paulo State University), Maria Claudia Nunes Delfino (FATEC), Deise Prina Dutra (UFMG), Ana Elisa Bocorny (UFRG), & Simone Sarmento (UFRG)

The COVID-19 infomedic has often been associated with social media. However, scholarly discourse has also served as a key avenue for denialist groups to spread misinformation. These groups frequently rely on contentious treatments to support their claims, rejecting scientifically supported guidelines from reputable health organizations. Despite the importance of this issue, few studies have explored how the infodemic manifests within academic contexts. Using Lexical Multidimensional (LMD) Analysis, we show that while academic terms are used for validation, the vocabulary of Controversial Treatments (CT) significantly diverges from that of Endorsed Treatments (ET). Both systems depend on distinct lexical collocations. This difference reflects how CT and ET are distinct meaning-making systems shaped by unique ideologies, histories, and motivations.

Unexpected Discourse in the Coruña Corpus of English Scientific Writing: Life Sciences as a case in point

Isabel Moskowich (University of A Coruña)

At present, scientific discourse is considered to be detached and objective, impersonal and direct thanks to a conscious effort starting after the Scientific Revolution of the eighteenth century. The Coruña Corpus of English Scientific Writing compiles late Modern English texts of several disciplines precisely from that revolution onwards, from 1700 to 1900. It is my aim in this talk to present one of its subcorpora, CELiST, the Corpus of English Life Sciences and to focus on the ways in which it can be used to study the development of scientific discourse across time. To contextualize this study of scientific prose, I will address the gradual standardisation of scientific publication procedures. Scientific journals create both explicit guidelines for authors and construct unspoken conventions, such as the expectation of predetermined sections in papers (IMRD: Introduction, Method, Results and Discussion). I will also focus on the relationship between theoretical and applied knowledge and its implications for the “know-how-to” determinants that govern developments in applied linguistics. Against this background, I will demonstrate that common ideas about the nature of scientific prose are often mistaken, and that the discourse of science tends to take unexpected turns.

Studying Medical Discourse in Online Discussion Forums: methodological opportunities and challenges for corpus-based approaches

Henry Jones (University of Manchester)

Online discussion forums offer a rich source of potential data for corpus-based analyses of contemporary medical discourse. Facebook, X/Twitter, Reddit and Wikipedia, for example, have all become prominent sites of intense public debate on controversial medical topics – from abortion to alternative medicine, face-masks to vaccination – often featuring contributors with a diversity of professional backgrounds, values, opinions and beliefs. Such computer-mediated conversations are typically readily accessible to the researcher and the application of corpus tools and methods would seem an obvious step given the sheer scale of the data available. There remain, however, a number of significant methodological challenges which must be grappled with when constructing and analysing a large thematic corpus of online discussions. In this talk, I will share my experiences of developing and interrogating a corpus of medicine-related Wikipedia ‘Talk Page’ content, focusing in particular on methodological concerns deriving from the interactive nature of this forum data.

Late Modern English microbiology texts in the Royal Society Corpus

Katrin Menzel (University of Mannheim)

This talk presents some insights from a corpus-linguistic study on early microbiology articles in the Royal Society Corpus (RSC) from the 19th and early 20th centuries. The current full version of the RSC contains ca. 48,000 digitised scientific articles from journals such as the Philosophical Transactions and the Proceedings from the Royal Society of London from 1665 onwards. Large parts of the corpus can be downloaded for free or queried online. The corpus texts cover a wide range of scientific topics and disciplines, particularly from the physical and mathematical sciences as well as from the biological and life sciences. The analysis of the microbiology texts in the RSC from the 19th and early 20th centuries aims to shed light on how the research developments of the nascent community of British microbiologists are reflected in specific linguistic patterns in these texts. Additionally, the talk examines the question of whether there are gendered linguistic differences in these research articles. A comparison of gendered subcorpora of microbiology research articles in the RSC shows, for instance, that texts written exclusively by male authors in the analysed time span generally contain more and longer adjectives and more terminological patterns than texts with female contributors. This reflects slightly different activities and methods in the work of early female and male microbiologists in Britain.