Saikat's notes on AI
🏠🐦💼🧑‍💻
  • Hello world!
  • 🚀LLM
    • The Evolution of Language Models: From Word2Vec to GPT-4
      • [1] Word2Vec - Efficient Estimation of Word Representations in Vector Space
      • [2] Seq2Seq - Sequence to Sequence Learning with Neural Networks
      • [3] Attention Mechanism - Neural Machine Translation by Jointly Learning to Align and Translate
      • [4] Transformers - Attention Is All You Need
      • [5] GPT - Improving Language Understanding by Generative Pre-Training
      • [6] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
      • [7] T5 - Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
      • [8] GPT2 - Language Models are Unsupervised Multitask Learners
  • Best LLM Resources on the internet
  • MPT-7B: A Revolutionary Leap in Language Models
  • From Rules to Vectors: How NLP Changed Over Time
Powered by GitBook
On this page

Was this helpful?

  1. LLM
  2. The Evolution of Language Models: From Word2Vec to GPT-4

[2] Seq2Seq - Sequence to Sequence Learning with Neural Networks

Previous[1] Word2Vec - Efficient Estimation of Word Representations in Vector SpaceNext[3] Attention Mechanism - Neural Machine Translation by Jointly Learning to Align and Translate

Last updated 2 years ago

Was this helpful?

Title: Sequence to Sequence Learning with Neural Networks

Authors & Year: Ilya Sutskever, Oriol Vinyals, and Quoc V. Le, 2014

Link:

Objective: Develop a general-purpose approach to learn mappings from input sequences to output sequences using neural networks, with a focus on machine translation.

Context: Before sequence-to-sequence learning, NLP tasks involving input-output mappings, like machine translation, were addressed using complex pipelines and rule-based systems that were difficult to maintain and improve.

Key Contributions:

  • Introduced the encoder-decoder architecture using recurrent neural networks (RNNs) for sequence-to-sequence learning.

  • Demonstrated the effectiveness of the model on English-to-French translation tasks.

Methodology:

  • The encoder reads the input sequence and encodes it into a fixed-length vector representation.

  • The decoder takes the encoded vector and generates the output sequence, one token at a time.

Results:

  • The sequence-to-sequence model achieved state-of-the-art performance on the WMT'14 English-to-French translation task.

  • The model successfully learned long-range dependencies and complex linguistic structures.

Impact:

  • Established the foundation for complex tasks like machine translation, text summarization, and dialogue systems.

  • Inspired further research in NLP, including the development of attention mechanisms and transformer architectures.

Takeaways:

  • The sequence-to-sequence model maps input sequences to output sequences using an encoder-decoder architecture with RNNs.

  • The model proved effective in machine translation tasks, learning complex linguistic structures and dependencies.

  • The encoder-decoder approach has had a significant impact on NLP research and the development of more advanced models.

🚀
https://arxiv.org/abs/1409.3215