COMPLEX TEXTS
https://doi.org/10.5281/zenodo.20377855
Kalit so‘zlar
linguistic complexity, neural networks, complexity prediction, interpretability, text classification, genre analysis, noun phrases, English, Russian, cross-linguistic variation, discourse features, syntactic complexityAnnotasiya
Linguistic complexity constitutes a multi-level construct, manifesting across textual, clausal, lexical, and sublexical domains, and intersecting with various linguistic features (e.g., genre, syntax, semantics) as well as tasks such as language acquisition, translation, and instruction. Cross-linguistic variation in complexity measurements further arises from typological differences, culturally embedded genre conventions, and dataset-specific properties. In this study, we employ artificial neural networks both to predict linguistic complexity and to interpret those predictions. Although neural models optimize millions of parameters to achieve high empirical performance, they typically function as black boxes, offering no explicit account of which linguistic cues inform their decisions. We demonstrate how to associate neural complexity estimates with transparent, interpretable features—including the frequency of conjunctions, discourse particles, and subordinate clauses. Using English and Russian texts drawn from multiple genres, we train neural models to discriminate between less complex and more complex texts. Our findings indicate that noun frequency and the structural complexity of noun phrases are significant predictors of textual complexity. Finally, we examine the relationship between complexity and genre, revealing that certain feature–complexity associations are driven by genre differences rather than by intrinsic linguistic difficulty.
Foydalanilgan adabiyotlar ro‘yhati
Crossley, S. A., & McNamara, D. S. (2014). Does writing development equal writing quality? A computational investigation of syntactic complexity in L2 learners. Journal of Second Language Writing, 26, 66–79.
Lau, J. H., & Baldwin, T. (2016). An empirical evaluation of doc2vec with practical insights into document embedding generation. Proceedings of the 1st Workshop on Representation Learning for NLP, 78–86.
Le, Q., & Mikolov, T. (2014). Distributed representations of sentences and documents. Proceedings of the 31st International Conference on Machine Learning, 1188–1196.
McNamara, D. S., Crossley, S. A., & Roscoe, R. D. (2013). Natural language processing in an intelligent writing strategy tutoring system. Behavior Research Methods, 45(2), 499–515.
Mikolov, T., Sutskever, I., Chen, K., Corrado, G. S., & Dean, J. (2013). Distributed representations of words and phrases and their compositionality. Advances in Neural Information Processing Systems, 26, 3111–3119.
Pilán, I., Volodina, E., & Johansson, R. (2014). Rule-based and machine learning approaches for second language sentence-level readability. Proceedings of the 9th Workshop on Innovative Use of NLP for Building Educational Applications, 174–184.
Reynolds, R. (2016). The role of noun phrase complexity in L2 reading comprehension. Reading in a Foreign Language, 28(1), 86–105.
Solovyev, V., & Solnyshkina, M. (2018). Text complexity evaluation in Russian: Approaches and resources. Proceedings of the International Conference on Computational Linguistics and Intellectual Technologies, 17(24), 607–618.
