MPT-7B: A Revolutionary Leap in Language Models

MosaicML unveiled MPT-7B, a groundbreaking foundational Large Language Model (LLM) for commercial applications.

Previous models were limited by the Llama model's commercial restrictions.

Release note: Introducing MPT-7B: A New Standard for Open-Source, Commercially Usable LLMs

Fine-Tuned Variants:

  1. MPT-7B-StoryWriter-65k:

    • Context length of 65,000 tokens, extrapolating up to 84,000 tokens.

    • Demonstrated by crafting an epilogue for The Great Gatsby.

    • Fine-tuned on a dataset of fiction books.

    • Available for commercial use.

  2. MPT-7B-Instruct:

    • Fine-tuned for providing instructions.

    • Utilizes a more extensive dataset than Dolly.

    • Cleared for commercial applications.

  3. MPT-7B-Chat:

    • Similar to ChatGPT, designed for engaging in dialogues.

    • Commercial usage not permitted due to restricted dataset access.

Key Takeaways:

  • Pivotal moment in LLM development.

  • Training a foundational model is expensive, but fine-tuning is more affordable.

  • Costs:

    • Base MPT-7B model training cost: over $200,000.

    • MosaicML commendably open-sourced the model.

    • Fine-tuning an instruction-tuned model can cost under $50 (MosaicML did it for $37).

  • Benefits:

    • Powerful testament to MosaicML platform capabilities.

    • Advantageous for businesses seeking smaller, specialized ChatGPT-like models on their own servers.

    • Addresses potential data privacy concerns.

Last updated