Corpus-Assisted Discourse Studies and Learner Corpus Research: Methodological Innovations and Pedagogical Applications
Keywords:
corpus-assisted discourse studies, learner corpus research, topic modeling, metadata, second language acquisitionAbstract
Corpus-assisted discourse studies (CADS) and learner corpus research (LCR) represent two rapidly evolving domains within corpus linguistics, each offering unique methodological contributions to understanding language use and second language acquisition. This systematic review examines recent developments in both fields from 2023-2025, focusing on methodological innovations, theoretical frameworks, and pedagogical applications. Through analysis of 10 key publications, this study identifies three primary trends: increased automation through artificial intelligence integration, enhanced interdisciplinarity bridging corpus linguistics with discourse analysis and SLA theory, and growing diversity in corpus design and data collection methods. Findings reveal that CADS has expanded beyond traditional concordancing techniques to incorporate topic modeling and large language models for discourse analysis, while LCR has progressed toward more sophisticated metadata systems and multifactorial study designs. Both fields demonstrate significant pedagogical applications, though challenges persist in balancing quantitative rigor with qualitative depth and ensuring reproducibility of findings. This review proposes future research directions emphasizing transparent methodologies, collaborative corpus development, and practical tools accessible to language educators and researchers across diverse linguistic contexts
References
Bednarek, M. (2025). Topic modelling in corpus-based discourse analysis: Uses and critiques. Corpus Linguistics and Linguistic Theory. https://doi.org/10.1177/14614456241293075
Bednarek, M., Schweinberger, M., & Lee, K. K. H. (2024). Corpus-based discourse analysis: From meta-reflection to accountability. Corpus Linguistics and Linguistic Theory, 20(3), 539–566. https://doi.org/10.1515/cllt-2023-0104
Gao, Q., & Feng, D. (2025). Deploying large language models for discourse studies: An exploration of automated analysis of media attitudes. PLOS ONE, 20(1), e0313932. https://doi.org/10.1371/journal.pone.0313932
Gillings, M., Learmonth, M., & Mautner, G. (2024). Taking the road less travelled: How corpus-assisted discourse studies can enrich qualitative explorations of large textual datasets. British Journal of Management, 35(3), 1467–1485. https://doi.org/10.1111/1467-8551.12816
Götz, S., & Granger, S. (2024). Learner corpus research for pedagogical purposes: An overview and some research perspectives. International Journal of Learner Corpus Research, 10(1), 1–38.
Granger, S. (2024). From early to future learner corpus research. International Journal of Learner Corpus Research, 10(2), 247–279. https://doi.org/10.1075/ijlcr.00050.gra
Paquot, M. (2024). Learner corpus research: A critical appraisal and roadmap for contributing (more) to SLA research agendas. Corpus Linguistics and Linguistic Theory, 20(3), 567–590. https://doi.org/10.1515/cllt-2024-0014
Paquot, M., König, A., Stemle, E. W., & Frey, J.-C. (2024). The Core Metadata Schema for Learner Corpora (LC-meta): Collaborative efforts to advance data discoverability, metadata quality and study comparability in L2 research. International Journal of Learner Corpus Research, 10(2), 280–300.
Schweinberger, M. (2024). Seeded topic modeling as a more appropriate alternative to unsupervised standard topic models. Corpus Linguistics and Linguistic Theory.
Larsson, T., & Biber, D. (2025). Encouraging cumulative knowledge building as normal practice in (learner) corpus research. International Journal of Learner Corpus Research, 11(1), 1–16




