THE UTILIZATION OF THE LINGUISTIC CORPUS IN THE ANALYSIS OF CONTEMPORARY POETRY: AN EMPIRICAL STUDY USING POETS.ORG DATABASES

Authors

  • Yunita Dida Universitas Nusa Cendana Author

Keywords:

: linguistic corpus, poetry analysis, Poets.org, computational stylistics, digital humanities

Abstract

This study investigates the application of corpus linguistics as an empirical approach to analyzing 
linguistic and stylistic characteristics in contemporary English-language poetry. Addressing the 
limitations of traditional literary criticism that often relies on impressionistic interpretation, this research 
demonstrates how quantitative linguistic evidence can illuminate systematic lexical, grammatical, and 
metaphorical patterns in poetic discourse. Drawing on a corpus compiled from the Poets.org database, 
which includes works by modern and contemporary poets, the study employs a descriptive-quantitative 
design grounded in corpus stylistics. Analytical procedures involve measuring lexical frequency, 
collocational tendencies, syntactic complexity, and lexical diversity to uncover linguistic variation 
across poetic periods. The findings reveal significant distinctions in lexical selection, metaphor density, 
and structural complexity: modern poetry exhibits higher lexical diversity, with an average type–token 
ratio of 0.72, whereas contemporary poetry tends toward syntactic simplification and a greater reliance 
on concrete imagery. These results indicate a stylistic shift from linguistic elaboration to experiential 
immediacy, reflecting broader changes in poetic expression and ideology. The study contributes to the 
methodological advancement of corpus stylistics in literary analysis by establishing an empirically 
grounded framework for exploring data-driven interpretations of poetic language.

References

AELINCO (Spanish Association of Corpus Linguistics). (2024). 15th International Corpus Linguistics

Conference (CILC2024): Corpus Linguistics, (digital) discourse, and AI. University of Las Palmas

de Gran Canaria, Spain. May 22-24, 2024.

Anthony, L. (2022). AntConc (Version 4.2.0) [Computer Software]. Tokyo, Japan: Waseda University.

Available from https://www.laurenceanthony.net/software

Biber, D. (1993). Representativeness in corpus design. Literary and Linguistic Computing, 8(4), 243

257.

Chakraborty, R., & Blanco, E. (2024). Understanding poetry using natural language processing tools:

A survey. Digital Scholarship in the Humanities, 39(2), 456-478.

CLARIN ERIC. (2024). Literary corpora. Common Language Resources and Technology

Infrastructure. Retrieved from https://www.clarin.eu/resource-families/literary-corpora

Gries, S. Th. (2009). What is corpus linguistics? Language and Linguistics Compass, 3(5), 1225-1241.

https://doi.org/10.1111/j.1749-818X.2009.00149.x

Jacobs, A. M. (2018). The Gutenberg English Poetry Corpus: Exemplary quantitative narrative

analyses. Frontiers in Digital Humanities, 5, Article 5. https://doi.org/10.3389/fdigh.2018.00005

Le Thanh Thao, & Nguyen Thi Thuy Linh. (2024). Corpus-based analysis of film criticism: Linguistic

nuances and thematic patterns. Forum for Linguistic Studies, 6(1), 420-445.

https://doi.org/10.59400/fls.v6i1.2103

Leiden University. (2024). Corpus linguistics 2024-2025. Module description. Retrieved from

https://studiegids.universiteitleiden.nl/en/courses/130327/corpus-linguistics

McEnery, T., & Hardie, A. (2011). Corpus linguistics: Method, theory and practice. Cambridge:

Cambridge University Press. https://doi.org/10.1017/CBO9780511981395

McIntyre, D., & Walker, B. (2022). Using corpus linguistics to explore the language of poetry: A

stylometric approach to Yeats' poems. In A. O'Keeffe & M. J. McCarthy (Eds.), The Routledge

handbook

of

corpus

linguistics

https://doi.org/10.4324/9780367076399-35

(2nd

ed.,

pp.

499-516).

Routledge.

Poets.org. (2024). Poetry database and literary resources. Academy of American Poets. Retrieved

from https://poets.org/

Römer, U. (2006). Where the computer meets language, literature, and pedagogy: Corpus analysis in

English studies. In A. Gerbig & A. Müller-Wood (Eds.), How globalization affects the teaching

of English: Studying culture through texts (pp. 81-109). Lampeter: Edwin Mellen Press.

Schmitt, N. (2004). Formulaic sequences: Acquisition, processing and use. Amsterdam: John

Benjamins Publishing.

Stubbs, M. (2005). Conrad in the computer: Examples of quantitative stylistic methods. Language and

Literature, 14(1), 5-24. https://doi.org/10.1177/0963947005048873

Toivanen, J. M., Toivonen, H., Valitutti, A., & Gross, O. (2012). Corpus-based generation of content

and form in poetry. Proceedings of the Third International Conference on Computational

Creativity, 175-179.

from

University of York. (2025). English corpus linguistics (LAN00032H) 2025-26. Catalog module.

Retrieved

https://www.york.ac.uk/students/studying/manage/programmes/module

catalogue/

Downloads

Published

2025-11-28

Issue

Section

Articles