Analysis of language traits with natural language processing techniques in early detection of depression
Keywords:
early depression detection, document representations, word embeddings, ERDE metricAbstract
The development of computational methods using information from the Web for early detection of risks is a socially relevant, scientifically attractive and currently a growing area of research. Depression is one of the most frequent mental disorders in the world and with high incidence of suicide in the most severe cases. Therefore, early detection of this illness could lead to a timely treatment and to save lives. This paper analyzes the relationship between computational models that allow the automatic detection of depression and the linguistic properties of the text written by people who experience the disease. State-of-the-art text representations in document classification are used, covering linguistic, syntactic and semantic aspects. The results obtained with standard classifiers indicate that word embeddings capture precise information to detect quickly and safely signs of depression.
References
Acosta-Hernández M. E. et al. (2011). Depresión en la infancia y adolescencia: enfermedad de nuestro tiempo. Archivos de Neurociencia, 16(3), 156–161.
Al-Mosaiwi M. y Johnstone T. (2018). In an Absolute State: Elevated Use of Absolutist Words Is a Marker Specific to Anxiety, Depression, and Suicidal Ideation. Clinical Psychological Science, 6(4), 529–542.
Boston University School of Medicine. (2 de septiembre de 2020). COVID-19 has likely tripled depression rate, study finds. ScienceDaily. Accedido el 24 de Noviembre de 2020: www.sciencedaily.com/releases/2020/09/200902152202.htm
Breiman, L. (2001). Random forests. Machine learning, 45(1), 5–32.
Cagnina, L. C. et al. (2019). k-TVT: a flexible and effective method for early depression detection. En: XXV Congreso Argentino de Ciencias de la Computación CACIC 2019. Libro de Actas, 547–556.
Calvo, R. A. et al. (2017). Natural language processing in mental health applications using non-clinical texts. Natural Language Engineering, 23(5), 649–685.
Chandran, D. et al. (2019). Use of Natural Language Processing to identify Obsessive Compulsive Symptoms in patients with schizophrenia, schizoaffective disorder or bipolar disorder. Scientific Reports, 9(1), 1–7.
Cortes, C. y Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273–297.
Devlin, J. et al. (2019). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. En: Proceedings of the 2019 Conference of the North American Chapter of the ACL: Human Language Technologies, 1, 4171–4186.
Funez, D. G. et al. (2018). UNSL's participation at eRisk 2018 Lab. En: Working Notes of CLEF 2018 - Conference and Labs of the Evaluation Forum, 2125.
Harris, Z. S. (1954). Distributional structure. Word, 10(2-3), 146-162.
Jurafsky, D. y Martin, J. H. (2020). Speech and language processing: an introduction to natural language processing, computational linguistics, and speech recognition. Tercera Edición. En prensa. Borrador accedido el 5 de Marzo de 2021: https://web.stanford.edu/~jurafsky/slp3/ed3book_dec302020.pdf
Kemp, S. More than half of the people on earth now use social media. Datareportal. Accedido el 26 de Noviembre de 2020: https://datareportal.com/reports/more-than-half-the-world-now-uses-social-media
Li, Z. et al. (2011) Fast text categorization using concise semantic analysis. Pattern Recognition Letters. 32(3), 441–448.
Losada, D. E. y Crestani, F. (2016). A test collection for research on depression and language use. En: Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2016. LNCS, 9822, 28–39.
Losada, D. E., Crestani, F. y Parapar, J. (2018). Overview of erisk: early risk prediction on the internet. En: Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF 2018. LNSC, 11018, 343–361.
Low, D. M. et al. (2020). Natural Language Processing Reveals Vulnerable Mental Health Support Groups and Heightened Health Anxiety on Reddit During COVID-19: Observational Study. Journal of medical Internet research, 22(10), e22635.
Mikolov, T. et al. (2013). Efficient estimation of word representations in vector space. En Proceedings of Workshop at International Conference on Learning Representations (ICLR).
Nguyen, M. H. et al. (2020). Changes in Digital Communication During the COVID-19 Global Pandemic: Implications for Digital Inequality and Future Research. Social Media + Society, 6(3), 1–6.
Pennebaker, J. W. et al. (2015). Linguistic Inquiry and Word Count: LIWC2015. Austin, TX: Pennebaker Conglomerates.
Pennington, J., Socher, R. y Manning, C. D. (2014). GloVe: Global vectors for word representation. En Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 1532–1543.
Salton, G. y McGill, M. J. (1983). Introduction to modern information retrieval. New York, NY: McGraw-Hill.
Spärck Jones, K. (1972). A statistical interpretation of term specificity and its application in retrieval. Journal of Documentation, 28, 11–21.
Turney, P. D. y Pantel, P. (2010). From frequency to meaning: Vector space models of semantics. Journal of artificial intelligence research (JAIR), 37, 141–188.
Vaswani, A. et al. (2017). Attention is all you need. En Advances in Neural Information Processing Systems, 30, 5998–6008.
Zhang, Y., Jin, R. y Zhou, Z. H. (2010). Understanding bag-of-words model: a statistical framework. International Journal of Machine Learning and Cybernetics, 1(1–4), 43–52.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2021 Anales de Lingüística
Esta obra está bajo una Licencia Creative Commons Atribución 2.5 Argentina.
Los/as autores/as que publican en esta revista están de acuerdo con los siguientes términos:
1. Los/as autores conservan los derechos de autor y garantizan a la revista el derecho de ser la primera publicación del trabajo bajo una licecncia Creative Commons Atribución 2.5 Argentina (CC BY 2.5 AR) . Por esto pueden compartir el trabajo con la referencia explícita de la publicación original en esta revista.
2. Anales de lingüística permite y anima a los autores a difundir la publicación realizada electrónicamente, a través de su enlace y/o de la versión postprint del archivo descargado de forma independiente.
3. Usted es libre de:
Compartir — copiar y redistribuir el material en cualquier medio o formato
Adaptar — remezclar, transformar y construir a partir del material para cualquier propósito, incluso comercialmente.