Common Sense: On The Llama 3 Herd of Models

Thursday, June 12, 2025

On The Llama 3 Herd of Models

Very recommendable! I finally read this very detailed and comprehensive paper (92 pages) on Meta's family of foundation models.

They describe in great detail e.g. the many hardware issues developing and training such huge models.

Their chapter on the enormous efforts to measure and mitigate safety issues is also very recommendable

How to cope with false refusals and much more.

From the abstract:

"Modern artificial intelligence (AI) systems are powered by foundation models. This paper presents a new set of foundation models, called Llama 3. It is a herd of language models that natively support multilinguality, coding, reasoning, and tool usage. Our largest model is a dense Transformer with 405B parameters and a context window of up to 128K tokens. This paper presents an extensive empirical evaluation of Llama 3. We find that Llama 3 delivers comparable quality to leading language models such as GPT-4 on a plethora of tasks. We publicly release Llama 3, including pre-trained and post-trained versions of the 405B parameter language model and our Llama Guard 3 model for input and output safety. The paper also presents the results of experiments in which we integrate image, video, and speech capabilities into Llama 3 via a compositional approach. We observe this approach performs competitively with the state-of-the-art on image, video, and speech recognition tasks. The resulting models are not yet being broadly released as they are still under development."

[2407.21783] The Llama 3 Herd of Models

Thursday, June 12, 2025

On The Llama 3 Herd of Models

No comments: