CORPUS LINGUISTICS IN DISCOURSE ANALYSIS: A SYSTEMATIC LITERATURE REVIEW OF METHODOLOGICAL INNOVATIONS AND EMPIRICAL APPLICATIONS (2015-202)
Keywords:
corpus linguistics; discourse analysis; corpus-assisted discourse studies; critical discourse analysis; systematic literature review; methodological innovation.Abstract
This systematic literature review investigates methodological innovations and empirical applications of corpus linguistics in discourse analysis from 2015 to 2025. Drawing on 45 empirical studies retrieved from major academic databases; Scopus, Web of Science, and Google Scholar, this research identifies emerging methodological patterns, technological advancements, and ongoing theoretical challenges within Corpus-Assisted Discourse Studies (CADS). The findings demonstrate that integrating corpus linguistics with critical discourse analysis has produced substantial methodological synergy, enabling systematic, evidence-based interpretation of linguistic patterns across large-scale textual corpora. The review delineates five principal domains of application: media and political discourse, social group representation, health and environmental communication, multimodal discourse analysis, and the integration of artificial intelligence technologies. Despite these advances, methodological constraints persist, including issues of researcher bias, corpus representativeness, and limited resources for non-English language data. The study’s theoretical contribution lies in providing a comprehensive mapping of CADS as a transdisciplinary framework that fuses quantitative corpus methodologies with qualitative discourse interpretation. Practically, the review underscores the need for greater methodological transparency, development of corpus tools for under-resourced languages, and ethically informed adoption of AI-driven methods in discourse research. Ultimately, this review offers a systematic conceptual foundation for scholars employing corpus-based approaches in discourse studies and highlights future research trajectories involving multimodal analysis, diachronic corpora, and the expansion of CADS in Global South contexts.
References
Baker, P. (2006). Using corpora in discourse analysis. London: Continuum.
Baker, P. (2020). Corpus-assisted discourse analysis. In P. Baker (Ed.), Researching discourse (pp. 131-149). London: Routledge.
Baker, P. (2023). Using corpora in discourse analysis (2nd ed.). London: Bloomsbury.
Baker, P., Gabrielatos, C., Khosravinik, M., Krzyzanowski, M., McEnery, T., & Wodak, R. (2008). A useful methodological synergy? Combining critical discourse analysis and corpus linguistics to examine discourses of refugees and asylum seekers in the UK press. Discourse & Society, 19(3), 273-306.
Biber, D., & Conrad, S. (2019). Register, genre, and style (2nd ed.). Cambridge: Cambridge University Press.
Biber, D., Connor, U., & Upton, T. A. (2007). Discourse on the move: Using corpus analysis to describe discourse structure. Amsterdam: John Benjamins.
Biber, D., Conrad, S., & Reppen, R. (1998). Corpus linguistics: Investigating language structure and use. Cambridge: Cambridge University Press.
Conrad, S. M. (2002). Corpus linguistic approaches for discourse analysis. Annual Review of Applied Linguistics, 22, 75-95.
Fairclough, N. (2015). Language and power (3rd ed.). London: Routledge.
Flowerdew, J., & Richardson, J. E. (Eds.). (2018). The Routledge handbook of critical discourse studies. London: Routledge.
Gabrielatos, C., & Baker, P. (2008). Fleeing, sneaking, flooding: A corpus analysis of discursive constructions of refugees and asylum seekers in the UK Press 1996-2005. Journal of English Linguistics, 36(1), 5-38.
Gillings, M., Mautner, G., & Baker, P. (2023). Corpus-assisted discourse studies. Cambridge: Cambridge University Press.
Gillings, M., Mautner, G., & Baker, P. (2024). Taking the road less travelled: How corpus-assisted discourse studies can enrich qualitative explorations of large textual datasets. British Journal of Management, 35(2), 883-903.
Incelli, E. (2025). Exploring the future of corpus linguistics: Innovations in AI and social impact. International Journal of Mass Communication, 3, 1-10.
Incelli, E. (2023). Generative AI for corpus approaches to discourse studies: A critical evaluation of ChatGPT. International Journal of Corpus Linguistics, 29(1), 129-155.
Jaworska, S., & Nanda, A. (2018). Doing well by talking good: A topic modelling-assisted discourse study of Corporate Social Responsibility. Applied Linguistics, 39(3), 373-399.
Mautner, G. (2019). Checks and balances: How corpus linguistics can contribute to CDA. In R. Wodak & M. Meyer (Eds.), Methods of critical discourse studies (3rd ed., pp. 122-143). London: Sage.
McEnery, T., & Hardie, A. (2012). Corpus linguistics: Method, theory and practice. Cambridge: Cambridge University Press.
Partington, A., Duguid, A., & Taylor, C. (2013). Patterns and meanings in discourse: Theory and practice in corpus-assisted discourse studies (CADS). Amsterdam: John Benjamins.
Rao, Y., & Taboada, M. (2021). Gender bias in the news: A scalable topic modelling and visualization framework. Frontiers in Artificial Intelligence, 4, 664737.
Stubbs, M. (1996). Text and corpus analysis: Computer-assisted studies of language and culture. Oxford: Blackwell.
Stubbs, M. (2001). Words and phrases: Corpus studies of lexical semantics. Oxford: Blackwell.
Taboada, M. (2025). Topic modelling is a means to an end: On topic modelling in corpus linguistics and discourse analysis. Discourse Studies, 27(1), 3-8.
Wodak, R., & Meyer, M. (Eds.). (2016). Methods of critical discourse studies (3rd ed.). London: Sage.




