Saikat's notes on AI
🏠🐦💼🧑‍💻
  • Hello world!
  • 🚀LLM
    • The Evolution of Language Models: From Word2Vec to GPT-4
      • [1] Word2Vec - Efficient Estimation of Word Representations in Vector Space
      • [2] Seq2Seq - Sequence to Sequence Learning with Neural Networks
      • [3] Attention Mechanism - Neural Machine Translation by Jointly Learning to Align and Translate
      • [4] Transformers - Attention Is All You Need
      • [5] GPT - Improving Language Understanding by Generative Pre-Training
      • [6] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
      • [7] T5 - Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
      • [8] GPT2 - Language Models are Unsupervised Multitask Learners
  • Best LLM Resources on the internet
  • MPT-7B: A Revolutionary Leap in Language Models
  • From Rules to Vectors: How NLP Changed Over Time
Powered by GitBook
On this page

Was this helpful?

  1. LLM

The Evolution of Language Models: From Word2Vec to GPT-4

Understanding key papers that led to the invention of LLMs like GPT-4

PreviousHello world!Next[1] Word2Vec - Efficient Estimation of Word Representations in Vector Space

Last updated 2 years ago

Was this helpful?

Over the past decade, natural language processing (NLP) has undergone remarkable advancements, thanks to groundbreaking research and innovative techniques. From the initial development of Word2Vec to the emergence of large-scale pre-trained models like GPT and BERT, each step has significantly impacted the capabilities and applications of NLP systems. In this post, we explore the key research papers and ideas that have shaped the field, tracing the evolution from Word2Vec to GPT-4.

  1. Word2Vec: Introduced the concept of learning word embeddings that capture semantic meaning by predicting surrounding words in a sentence.

    📃

  2. Seq2Seq: Built on word embeddings to develop the encoder-decoder architecture using RNNs for mapping input sequences to output sequences. 📃

  3. Attention Mechanism: Improved seq2seq models by enabling networks to focus on relevant parts of the input when generating output. 📃

  4. Transformers: Introduced a novel NLP architecture that relied solely on attention mechanisms, discarding RNNs and CNNs. 📃

  5. GPT: Applied unsupervised pre-training and task-specific fine-tuning using the Transformer architecture to achieve impressive performance. 📃

  6. BERT: Extended pre-training with masked language modeling, enabling bidirectional context learning and achieving state-of-the-art performance. 📃

  7. T5: Adopted a unified text-to-text framework, demonstrating the importance of a unified approach for various NLP problems. 📃

  8. GPT-2: Increased model size and training data, demonstrating remarkable text generation abilities and raising ethical concerns. 📃

  9. GPT-3: Made a major leap forward with a larger model and more diverse training data, showcasing impressive few-shot learning capabilities. 📃

  10. LoRA: Addressed limitations of fine-tuning large-scale language models by introducing a low-rank adaptation technique, enabling efficient and effective fine-tuning. 📃

  11. InstructGPT: Extended GPT-3 by training it to follow instructions, demonstrating improved performance on downstream tasks with fewer examples. 📃

  12. GPT-4: The latest iteration, building on the successes of predecessors with further refinements and improvements, achieving state-of-the-art performance. 📃

🚀
Efficient Estimation of Word Representations in Vector Space
Sequence to Sequence Learning with Neural Networks
Neural Machine Translation by Jointly Learning to Align and Translate
Attention is All You Need
Improving Language Understanding by Generative Pre-Training
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Language Models are Unsupervised Multitask Learners
Language Models are Few-Shot Learners
LoRA: Low-Rank Adaptation of Large Language Models
Training language models to follow instructions with human feedback
GPT-4 Technical report