[7] T5 - Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Title: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Authors & Year: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu, 2019

Link: https://arxiv.org/abs/1910.10683

Objective: Develop a unified framework that converts all text-based language problems into a text-to-text format and explore the limits of transfer learning with a large-scale pre-trained model.

Context: Previous transfer learning methods for NLP used different architectures, pre-training objectives, and data sets for different tasks, limiting their generality and scalability.

Key Contributions

  • Introducing the text-to-text transfer transformer (T5) model, which uses a single encoder-decoder architecture and a text-to-text format for all NLP tasks.

  • Proposing a new pre-training objective called span corruption, which randomly masks contiguous spans of text.

  • Demonstrating the scalability of the model by pre-training on a new large-scale corpus called C4 (Colossal Clean Crawled Corpus).

Methodology:

  • Pre-training: T5 is a pre-trained, encoder-decoder model that uses a transformer architecture. The model is pre-trained on C4 using span corruption and a denoising objective.

  • Fine-tuning: The pre-trained model is fine-tuned on specific NLP tasks using supervised learning. The tasks are converted into a text-to-text format by adding natural language prefixes to the inputs and outputs.

Results: T5 achieved state-of-the-art performance on several benchmark datasets for NLP tasks, including the GLUE benchmark, SuperGLUE benchmark, SQuAD v1.1 and v2.0, CNN/Daily Mail summarization, WMT 2014 English-German translation, and more. The model also showed strong performance on zero-shot and few-shot learning settings.

Impact: T5 introduced a simple and effective framework that unifies various NLP tasks and leverages transfer learning at scale. Inspired further research in NLP, leading to innovations like BART and T6.

Takeaways: T5 is a pre-trained, encoder-decoder model that uses a text-to-text format for all NLP tasks. The model has achieved state-of-the-art performance on several benchmark datasets for NLP tasks. Text-to-text framework has become a popular approach in NLP and has led to significant advancements in the field.

Last updated