THE UTILIZATION OF THE LINGUISTIC CORPUS IN THE ANALYSIS OF CONTEMPORARY POETRY: AN EMPIRICAL STUDY USING POETS.ORG DATABASES
Keywords:
: linguistic corpus, poetry analysis, Poets.org, computational stylistics, digital humanitiesAbstract
This study investigates the application of corpus linguistics as an empirical approach to analyzing
linguistic and stylistic characteristics in contemporary English-language poetry. Addressing the
limitations of traditional literary criticism that often relies on impressionistic interpretation, this research
demonstrates how quantitative linguistic evidence can illuminate systematic lexical, grammatical, and
metaphorical patterns in poetic discourse. Drawing on a corpus compiled from the Poets.org database,
which includes works by modern and contemporary poets, the study employs a descriptive-quantitative
design grounded in corpus stylistics. Analytical procedures involve measuring lexical frequency,
collocational tendencies, syntactic complexity, and lexical diversity to uncover linguistic variation
across poetic periods. The findings reveal significant distinctions in lexical selection, metaphor density,
and structural complexity: modern poetry exhibits higher lexical diversity, with an average type–token
ratio of 0.72, whereas contemporary poetry tends toward syntactic simplification and a greater reliance
on concrete imagery. These results indicate a stylistic shift from linguistic elaboration to experiential
immediacy, reflecting broader changes in poetic expression and ideology. The study contributes to the
methodological advancement of corpus stylistics in literary analysis by establishing an empirically
grounded framework for exploring data-driven interpretations of poetic language.
References
AELINCO (Spanish Association of Corpus Linguistics). (2024). 15th International Corpus Linguistics
Conference (CILC2024): Corpus Linguistics, (digital) discourse, and AI. University of Las Palmas
de Gran Canaria, Spain. May 22-24, 2024.
Anthony, L. (2022). AntConc (Version 4.2.0) [Computer Software]. Tokyo, Japan: Waseda University.
Available from https://www.laurenceanthony.net/software
Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing, 8(4), 243
257.
Chakraborty, R., & Blanco, E. (2024). Understanding poetry using natural language processing tools:
A survey. Digital Scholarship in the Humanities, 39(2), 456-478.
CLARIN ERIC. (2024). Literary corpora. Common Language Resources and Technology
Infrastructure. Retrieved from https://www.clarin.eu/resource-families/literary-corpora
Gries, S. Th. (2009). What is corpus linguistics? Language and Linguistics Compass, 3(5), 1225-1241.
https://doi.org/10.1111/j.1749-818X.2009.00149.x
Jacobs, A. M. (2018). The Gutenberg English Poetry Corpus: Exemplary quantitative narrative
analyses. Frontiers in Digital Humanities, 5, Article 5. https://doi.org/10.3389/fdigh.2018.00005
Le Thanh Thao, & Nguyen Thi Thuy Linh. (2024). Corpus-based analysis of film criticism: Linguistic
nuances and thematic patterns. Forum for Linguistic Studies, 6(1), 420-445.
https://doi.org/10.59400/fls.v6i1.2103
Leiden University. (2024). Corpus linguistics 2024-2025. Module description. Retrieved from
https://studiegids.universiteitleiden.nl/en/courses/130327/corpus-linguistics
McEnery, T., & Hardie, A. (2011). Corpus linguistics: Method, theory and practice. Cambridge:
Cambridge University Press. https://doi.org/10.1017/CBO9780511981395
McIntyre, D., & Walker, B. (2022). Using corpus linguistics to explore the language of poetry: A
stylometric approach to Yeats' poems. In A. O'Keeffe & M. J. McCarthy (Eds.), The Routledge
handbook
of
corpus
linguistics
https://doi.org/10.4324/9780367076399-35
(2nd
ed.,
pp.
499-516).
Routledge.
Poets.org. (2024). Poetry database and literary resources. Academy of American Poets. Retrieved
from https://poets.org/
Römer, U. (2006). Where the computer meets language, literature, and pedagogy: Corpus analysis in
English studies. In A. Gerbig & A. Müller-Wood (Eds.), How globalization affects the teaching
of English: Studying culture through texts (pp. 81-109). Lampeter: Edwin Mellen Press.
Schmitt, N. (2004). Formulaic sequences: Acquisition, processing and use. Amsterdam: John
Benjamins Publishing.
Stubbs, M. (2005). Conrad in the computer: Examples of quantitative stylistic methods. Language and
Literature, 14(1), 5-24. https://doi.org/10.1177/0963947005048873
Toivanen, J. M., Toivonen, H., Valitutti, A., & Gross, O. (2012). Corpus-based generation of content
and form in poetry. Proceedings of the Third International Conference on Computational
Creativity, 175-179.
from
University of York. (2025). English corpus linguistics (LAN00032H) 2025-26. Catalog module.
Retrieved
https://www.york.ac.uk/students/studying/manage/programmes/module
catalogue/




