Saikat's notes on AI
🏠🐦💼🧑‍💻
  • Hello world!
  • 🚀LLM
    • The Evolution of Language Models: From Word2Vec to GPT-4
      • [1] Word2Vec - Efficient Estimation of Word Representations in Vector Space
      • [2] Seq2Seq - Sequence to Sequence Learning with Neural Networks
      • [3] Attention Mechanism - Neural Machine Translation by Jointly Learning to Align and Translate
      • [4] Transformers - Attention Is All You Need
      • [5] GPT - Improving Language Understanding by Generative Pre-Training
      • [6] BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
      • [7] T5 - Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
      • [8] GPT2 - Language Models are Unsupervised Multitask Learners
  • Best LLM Resources on the internet
  • MPT-7B: A Revolutionary Leap in Language Models
  • From Rules to Vectors: How NLP Changed Over Time
Powered by GitBook
On this page

Was this helpful?

MPT-7B: A Revolutionary Leap in Language Models

PreviousBest LLM Resources on the internetNextFrom Rules to Vectors: How NLP Changed Over Time

Last updated 2 years ago

Was this helpful?

MosaicML unveiled MPT-7B, a groundbreaking foundational Large Language Model (LLM) for commercial applications.

Previous models were limited by the Llama model's commercial restrictions.

Release note:

Fine-Tuned Variants:

  1. MPT-7B-StoryWriter-65k:

    • Context length of 65,000 tokens, extrapolating up to 84,000 tokens.

    • Demonstrated by crafting an epilogue for The Great Gatsby.

    • Fine-tuned on a dataset of fiction books.

    • Available for commercial use.

  2. MPT-7B-Instruct:

    • Fine-tuned for providing instructions.

    • Utilizes a more extensive dataset than Dolly.

    • Cleared for commercial applications.

  3. MPT-7B-Chat:

    • Similar to ChatGPT, designed for engaging in dialogues.

    • Commercial usage not permitted due to restricted dataset access.

Key Takeaways:

  • Pivotal moment in LLM development.

  • Training a foundational model is expensive, but fine-tuning is more affordable.

  • Costs:

    • Base MPT-7B model training cost: over $200,000.

    • MosaicML commendably open-sourced the model.

    • Fine-tuning an instruction-tuned model can cost under $50 (MosaicML did it for $37).

  • Benefits:

    • Powerful testament to MosaicML platform capabilities.

    • Advantageous for businesses seeking smaller, specialized ChatGPT-like models on their own servers.

    • Addresses potential data privacy concerns.

Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs