Saikat's notes on AI
🏠🐦💼🧑‍💻
  • Hello world!
  • 🚀LLM
    • The Evolution of Language Models: From Word2Vec to GPT-4
      • [1] Word2Vec - Efficient Estimation of Word Representations in Vector Space
      • [2] Seq2Seq - Sequence to Sequence Learning with Neural Networks
      • [3] Attention Mechanism - Neural Machine Translation by Jointly Learning to Align and Translate
      • [4] Transformers - Attention Is All You Need
      • [5] GPT - Improving Language Understanding by Generative Pre-Training
      • [6] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
      • [7] T5 - Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
      • [8] GPT2 - Language Models are Unsupervised Multitask Learners
  • Best LLM Resources on the internet
  • MPT-7B: A Revolutionary Leap in Language Models
  • From Rules to Vectors: How NLP Changed Over Time
Powered by GitBook
On this page

Was this helpful?

  1. LLM
  2. The Evolution of Language Models: From Word2Vec to GPT-4

[7] T5 - Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Previous[6] BERT: Pre-training of Deep Bidirectional Transformers for Language UnderstandingNext[8] GPT2 - Language Models are Unsupervised Multitask Learners

Last updated 2 years ago

Was this helpful?

Title: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer

Authors & Year: Colin Raffel, Noam Shazeer, Adam Roberts, Katherine Lee, Sharan Narang, Michael Matena, Yanqi Zhou, Wei Li, Peter J. Liu, 2019

Link:

Objective: Develop a unified framework that converts all text-based language problems into a text-to-text format and explore the limits of transfer learning with a large-scale pre-trained model.

Context: Previous transfer learning methods for NLP used different architectures, pre-training objectives, and data sets for different tasks, limiting their generality and scalability.

Key Contributions

  • Introducing the text-to-text transfer transformer (T5) model, which uses a single encoder-decoder architecture and a text-to-text format for all NLP tasks.

  • Proposing a new pre-training objective called span corruption, which randomly masks contiguous spans of text.

  • Demonstrating the scalability of the model by pre-training on a new large-scale corpus called C4 (Colossal Clean Crawled Corpus).

Methodology:

  • Pre-training: T5 is a pre-trained, encoder-decoder model that uses a transformer architecture. The model is pre-trained on C4 using span corruption and a denoising objective.

  • Fine-tuning: The pre-trained model is fine-tuned on specific NLP tasks using supervised learning. The tasks are converted into a text-to-text format by adding natural language prefixes to the inputs and outputs.

Results: T5 achieved state-of-the-art performance on several benchmark datasets for NLP tasks, including the GLUE benchmark, SuperGLUE benchmark, SQuAD v1.1 and v2.0, CNN/Daily Mail summarization, WMT 2014 English-German translation, and more. The model also showed strong performance on zero-shot and few-shot learning settings.

Impact: T5 introduced a simple and effective framework that unifies various NLP tasks and leverages transfer learning at scale. Inspired further research in NLP, leading to innovations like BART and T6.

Takeaways: T5 is a pre-trained, encoder-decoder model that uses a text-to-text format for all NLP tasks. The model has achieved state-of-the-art performance on several benchmark datasets for NLP tasks. Text-to-text framework has become a popular approach in NLP and has led to significant advancements in the field.

🚀
https://arxiv.org/abs/1910.10683