7B, 13B, 70B: Discover the Range of Llama 2 Models
Train with Token Precision
All Llama 2 models are trained with a global batch-size of 4M tokens, ensuring precision in training data. Notably, the larger model, 70B, utilizes Grouped-Query Attention (GQA) for enhanced performance.
Model Range and Capabilities
The Llama 2 family spans three distinct model sizes: 7B, 13B, and 70B. Each model is trained on 2 trillion tokens, boasting double the context length of Llama 1. This enhanced capacity offers improved language understanding and generation capabilities.
Unlocking Potential
The release of Llama 2 empowers individuals, creators, researchers, and businesses to explore the transformative potential of large language models. Experimentation with these models enables innovation and advancement in various fields.
Benchmarking the 70B Model
The Llama 2 70B benchmark invites submissions in two categories. To be eligible, submissions must exceed a reference score of 999, demonstrating the model's exceptional performance.
Comments