Deep Learning-based Natural Language Understanding Models and a Prototype GPT-2 Deployment Fine-Tuned for a Specific Natural Language Generation Task

Authors

  • Fernando Balbachan niversidad de Buenos Aires, Argentina, Universidade Tuiuti do Paraná, Brasil y Natural Tech, Argentina
  • Natalia Flechas Universidad de Buenos Aires, Argentina
  • Ignacio Maltagliatti Universidad Tecnológica Nacional, Argentina y Natural Tech, Argentina
  • Francisco Pensa Natural Tech, Argentina
  • Lucas Ramírez Natural Tech, Argentina

Keywords:

Deep Learning, ELMo, BERT, BERT-2, Natural Language Understanding (NLU), Natural Language Generation (NLG)

Abstract

Since 2013, the connectionist paradigm in Natural Language Processing (NLP) has resurged in academic circles by means of new architectures to be adopted later by the software industry with the use of great computing power. It is a truly algorithmic revolution, known as Deep Learning. Several models have been offered in a speedy race in order to improve state-of-the-art metrics for general domain NLP tasks, according to the most frequentlly used standards (BLEU, GLUE, SuperGLUE). From 2018 onwards, Deep Learning models have attracted even more attention through the so-called Transformers revolution (ELMo, BERT y GPT-2). In this paper, we propose a brief yet exhaustive survey on the models that have been evolving during this last decade. We also describe in detail a complete from scratch implementation for the most recent open-source model GPT-2, fine-tuned for a specific NLG task of slogan generation for commercial products.

References

Bahdanau, D., Cho, K. y Bengio, Y. (2015). Neural Machine Translation by Jointly Learning to Align and Translate. En ICLR 2015. Recuperado de https://arxiv.org/abs/1409.0473.

Bastings, J., Titov, I., Aziz, W., Marcheggiani, D. y Sima’an, K. (2017). Graph Convolutional Encoders for Syntax-aware Neural Machine Translation. En Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, (pp. 1957-1967). Recuperado de https://www.aclweb.org/anthology/D17-1209.pdf

Bradbury, J., Merity, S., Xiong, C. y Socher, R. (2017). Quasi-Recurrent Neural Networks. En ICLR 2017. Recuperado de http://arxiv.org/abs/1611.01576.

Dai, A. M. y Le, Q. V. (2015). Semi-supervised Sequence Learning. En Advances in Neural Information Processing Systems (NIPS ’15), (pp. 1-9). Recuperado de https://papers.nips.cc/paper/5949-semi-supervised-sequence-learning.pdf.

Devlin, J., Chang, M., Lee K. y Toutanova, K. (2018). BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. En Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL): Human Language Technologies, Volume 1 (Long and Short Papers), (pp. 4171-4186). Recuperado de https://www.aclweb.org/anthology/N19-1423.pdf.

Dyer, C., Kuncoro, A., Ballesteros, M. y Smith, N. A. (2016). Recurrent Neural Network Grammars. En NAACL. Recuperado de: http://arxiv.org/abs/1602.07776.

Elman, J. L. (1990). Finding structure in time. Cognitive science, 14(2), 179-211.

Gatt, A. y Krahmer, E. (2018). Survey of the State of the Art in Natural Language Generation: Core tasks, applications, and evaluation. Journal of Artificial Intelligence Research, 61, 65-170.

Graves, A., Wayne, G., Reynolds, M., Harley, T., Danihelka, I., Grabska-Barwińska, A., Gómez Colmenajero, S., Grefenstette, E., Ramalho, T., Agapiou, J., Puigdomènech Badia, A., Moritz Hermann, K., Zwols, Y., Ostrovski, G., Cain, A., King, H., Summerfield, C., Blunsom, P., Kavukcuoglu, K. y Hassabis, D. (2016). Hybrid computing using a neural network with dynamic external memory. Nature, 538, 471-476.

Henaff, M., Weston, J., Szlam, A., Bordes, A. y LeCun, Y. (2017). Tracking the World State with Recurrent Entity Networks. En Proceedings of ICLR 2017.

Hochreiter, S. y Schmidhuber, J. (1997). Long short-term memory. Neural computation, 9(8), 1735-1780.

Howard, J. y Ruder, S. (2018). Universal Language Model Fine-tuning for Text Classification. En Proceedings of ACL 2018, (pp. 328-339). Recuperado de https://www.aclweb.org/anthology/P18-1031.pdf.

Kalchbrenner, N., Grefenstette, E. y Blunsom, P. (2014). A Convolutional Neural Network for Modelling Sentences. En Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, (pp. 655–665). Recuperado de http://arxiv.org/abs/1404.2188.

Kalchbrenner, N., Espeholt, L., Simonyan, K., Oord, A. van den, Graves, A. y Kavukcuoglu, K. (2016). Neural Machine Translation in Linear Time. ArXiv Preprint ArXiv: Recuperado de http://arxiv.org/abs/1610.10099.

Kim, Y. (2014). Convolutional Neural Networks for Sentence Classification. En Proceedings of the Conference on Empirical Methods in Natural Language Processing, (pp. 1746–1751). Recuperado de http://arxiv.org/abs/1408.5882.

Kumar, A., Ondruska, P., Iyyer, M., Bradbury, J., Gulrajani, I., Zhong, V., Paulus, R. y Socher, R. (2016). Ask me anything: Dynamic memory networks for natural language processing. En International Conference on Machine Learning, (pp. 1378-1387). Recuperado de https://arxiv.org/pdf/1506.07285.pdf.

Levy, O. y Goldberg, Y. (2014). Dependency-Based Word Embeddings. En Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Short Papers), (pp. 302–308). Recuperado de https://doi.org/10.3115/v1/P14-2050.

Merity, S. Shirish Keskar, N. y Socher, R. (2017). Regularizing and Optimizing LSTM Language Models. Recuperado de https://arxiv.org/pdf/1708.02182.pdf.

Mikolov, T., Sutskever, I., Chen, Kia., Corrado, G. y Dean, J. (2013). Distributed representations of words and phrases and their compositionality. En Proceedings of NAACL-HLT 2018, (pp. 2227–2237). Recuperado de https://aclweb.org/anthology/N18-1202.

Pennington, J., Socher, R. y Manning, C. (2014). Glove: Global vectors for word representation. En Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), (pp. 1532–1543). Recuperado de https://www.aclweb.org/anthology/D14-1162.

Peters, M. E., Neumann, M., Iyyer, M., Gardner, M., Clark, C., Lee, K. y Zettlemoyer, L. (2018). Deep contextualized word representations. En Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL): Human Language Technologies, Volume 1 (Long Papers), (pp. 2227-2237). Recuperado de https://www.aclweb.org/anthology/N18-1202.

Radford, A., Wu, J., Child, R., Luan, D., Amodei, D. y Sutskever, I. (2019). Language Models Are Unsupervised Multitask Learners [blog]. OpenAI Blog, 1, 8.

Ramachandran, P., Liu, P. J. y Le, Q. V. (2017). Unsupervised Pretraining for Sequence to Sequence Learning. En Proceedings of EMNLP 2017.

Ruder, S. (2018). A review of the recent history of NLP [blog]. Recuperado de https://ruder.io/a-review-of-the-recent-history-of-nlp/.

Socher, R., Perelygin, A. y Wu, J. (2013). Recursive deep models for semantic compositionality over a sentiment treebank. En Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, (pp. 1631–1642).

Sutskever, I., Vinyals, O. y Le, Q. V. (2014). Sequence to sequence learning with neural networks. En Advances in Neural Information Processing Systems (NIPS ’14). Recuperado de: https://papers.nips.cc/paper/5346-sequence-to-sequence-learning-with-neural-networks.pdf.

Sukhbaatar, S., Szlam, A., Weston, J. y Fergus, R. (2015). End-To-End Memory Networks. En Proceedings of NIPS 2015. Recuperado de http://arxiv.org/abs/1503.08895.

Subramanian, S., Trischler, A., Bengio, Y. y Pal, C. J. (2018). Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning. En Proceedings of ICLR 2018.

Tai, K. S., Socher, R. y Manning, C. D. (2015). Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks. En ACL 2015, (pp. 1556–1566).

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L. y Polosukhin, I. (2017). Attention Is All You Need. Advances in Neural Information Processing Systems (NIPS), 1-11. Recuperado de https://papers.nips.cc/paper/7181-attention-is-all-you-need.pdf.

Wang, J., Yu, L., Lai, K. R. y Zhang, X. (2016). Dimensional Sentiment Analysis Using a Regional CNN-LSTM Model. En Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL 2016), (pp. 225–230).

Wang, A., Singh, A., Michael, J., Hill, F., Levy, O. y Bowman, S. (2019a). GLUE: A multi-task benchmark and analysis platform for natural language understanding. En International Conference on Learning Representations. Recuperado de https://openreview. net/forum?id=rJ4km2R5t7

Wang, A., Pruksachatkun, Y., Nangia, N., Singh, A., Michael, J., Hill, F., Levy, O. y Bowman, S. (2019b). SuperGLUE: A Stickier Benchmark for General-Purpose Language Understanding Systems. En 33rd Conference on Neural Information Processing Systems (NeurIPS 2019). Recuperado de https://w4ngatang.github.io/static/papers/super glue.pdf.

Published

21-12-2021

How to Cite

Balbachan, F., Flechas, N., Maltagliatti, I. ., Pensa, F., & Ramírez, L. (2021). Deep Learning-based Natural Language Understanding Models and a Prototype GPT-2 Deployment Fine-Tuned for a Specific Natural Language Generation Task . Anales De Lingüística, 2(7), 145–174. Retrieved from https://revistas.uncu.edu.ar/ojs3/index.php/analeslinguistica/article/view/5524