MPT-7B: A Revolutionary Leap in Language Models

MosaicML unveiled MPT-7B, a groundbreaking foundational Large Language Model (LLM) for commercial applications.

Previous models were limited by the Llama model's commercial restrictions.

Fine-Tuned Variants:

MPT-7B-StoryWriter-65k:
- Context length of 65,000 tokens, extrapolating up to 84,000 tokens.
- Demonstrated by crafting an epilogue for The Great Gatsby.
- Fine-tuned on a dataset of fiction books.
- ~~Available for commercial use.~~
MPT-7B-Instruct:
- Fine-tuned for providing instructions.
- Utilizes a more extensive dataset than Dolly.
- Cleared for commercial applications.
MPT-7B-Chat:
- Similar to ChatGPT, designed for engaging in dialogues.
- Commercial usage not permitted due to restricted dataset access.

Key Takeaways:

Pivotal moment in LLM development.
Training a foundational model is expensive, but fine-tuning is more affordable.
Costs:
- Base MPT-7B model training cost: over $200,000.
- MosaicML commendably open-sourced the model.
- Fine-tuning an instruction-tuned model can cost under $50 (MosaicML did it for $37).
Benefits:
- Powerful testament to MosaicML platform capabilities.
- Advantageous for businesses seeking smaller, specialized ChatGPT-like models on their own servers.
- Addresses potential data privacy concerns.

Last updated 2 years ago

Was this helpful?