Analysis of language traits with natural language processing techniques in early detection of depression

Authors

  • María José Garciarena Ucelay Universidad Nacional de San Luis
  • Leticia Cecilia Cagnina Universidad Nacional de San Luis
  • Marcelo Luis Errecalde Universidad Nacional de San Luis

Keywords:

early depression detection, document representations, word embeddings, ERDE metric

Abstract

The development of computational methods using information from the Web for early detection of risks is a socially relevant, scientifically attractive and currently a growing area of ​​research. Depression is one of the most frequent mental disorders in the world and with high incidence of suicide in the most severe cases. Therefore, early detection of this illness could lead to a timely treatment and to save lives. This paper analyzes the relationship between computational models that allow the automatic detection of depression and the linguistic properties of the text written by people who experience the disease. State-of-the-art text representations in document classification are used, covering linguistic, syntactic and semantic aspects. The results obtained with standard classifiers indicate that word embeddings capture precise information to detect quickly and safely signs of depression.

References

Acosta-Hernández M. E. et al. (2011). Depresión en la infancia y adolescencia: enfermedad de nuestro tiempo. Archivos de Neurociencia, 16(3), 156–161.

Al-Mosaiwi M. y Johnstone T. (2018). In an Absolute State: Elevated Use of Absolutist Words Is a Marker Specific to Anxiety, Depression, and Suicidal Ideation. Clinical Psychological Science, 6(4), 529–542.

Boston University School of Medicine. (2 de septiembre de 2020). COVID-19 has likely tripled depression rate, study finds. ScienceDaily. Accedido el 24 de Noviembre de 2020: www.sciencedaily.com/releases/2020/09/200902152202.htm

Breiman, L. (2001). Random forests. Machine learning, 45(1), 5–32.

Cagnina, L. C. et al. (2019). k-TVT: a flexible and effective method for early depression detection. En: XXV Congreso Argentino de Ciencias de la Computación CACIC 2019. Libro de Actas, 547–556.

Calvo, R. A. et al. (2017). Natural language processing in mental health applications using non-clinical texts. Natural Language Engineering, 23(5), 649–685.

Chandran, D. et al. (2019). Use of Natural Language Processing to identify Obsessive Compulsive Symptoms in patients with schizophrenia, schizoaffective disorder or bipolar disorder. Scientific Reports, 9(1), 1–7.

Cortes, C. y Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273–297.

Devlin, J. et al. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. En: Proceedings of the 2019 Conference of the North American Chapter of the ACL: Human Language Technologies, 1, 4171–4186.

Funez, D. G. et al. (2018). UNSL's participation at eRisk 2018 Lab. En: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, 2125.

Harris, Z. S. (1954). Distributional structure. Word, 10(2-3), 146-162.

Jurafsky, D. y Martin, J. H. (2020). Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Tercera Edición. En prensa. Borrador accedido el 5 de Marzo de 2021: https://web.stanford.edu/~jurafsky/slp3/ed3book_dec302020.pdf

Kemp, S. More than half of the people on earth now use social media. Datareportal. Accedido el 26 de Noviembre de 2020: https://datareportal.com/reports/more-than-half-the-world-now-uses-social-media

Li, Z. et al. (2011) Fast text categorization using concise semantic analysis. Pattern Recognition Letters. 32(3), 441–448.

Losada, D. E. y Crestani, F. (2016). A test collection for research on depression and language use. En: Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2016. LNCS, 9822, 28–39.

Losada, D. E., Crestani, F. y Parapar, J. (2018). Overview of erisk: early risk prediction on the internet. En: Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2018. LNSC, 11018, 343–361.

Low, D. M. et al. (2020). Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study. Journal of medical Internet research, 22(10), e22635.

Mikolov, T. et al. (2013). Efficient estimation of word representations in vector space. En Proceedings of Workshop at International Conference on Learning Representations (ICLR).

Nguyen, M. H. et al. (2020). Changes in Digital Communication During the COVID-19 Global Pandemic: Implications for Digital Inequality and Future Research. Social Media + Society, 6(3), 1–6.

Pennebaker, J. W. et al. (2015). Linguistic Inquiry and Word Count: LIWC2015. Austin, TX: Pennebaker Conglomerates.

Pennington, J., Socher, R. y Manning, C. D. (2014). GloVe: Global vectors for word representation. En Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532–1543.

Salton, G. y McGill, M. J. (1983). Introduction to modern information retrieval. New York, NY: McGraw-Hill.

Spärck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11–21.

Turney, P. D. y Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research (JAIR), 37, 141–188.

Vaswani, A. et al. (2017). Attention is all you need. En Advances in Neural Information Processing Systems, 30, 5998–6008.

Zhang, Y., Jin, R. y Zhou, Z. H. (2010). Understanding bag-of-words model: a statistical framework. International Journal of Machine Learning and Cybernetics, 1(1–4), 43–52.

Published

21-12-2021

How to Cite

Garciarena Ucelay, M. J., Cagnina, L. C., & Errecalde, M. L. (2021). Analysis of language traits with natural language processing techniques in early detection of depression. Anales De Lingüística, 2(7), 89–116. Retrieved from https://revistas.uncu.edu.ar/ojs3/index.php/analeslinguistica/article/view/5522