
Llama 2 emerges as a formidable contender in the realm of language models, crafted by Meta AI with an ethos of open-source collaboration.
Building on the foundation laid by its predecessor Llama 1, Llama 2 was introduced only a few months after the first iteration, showcasing a remarkable pace of development and adoption.
The development of Llama 2 was rooted in a fast-moving research project within Meta’s FAIR team, originally focused on formal mathematics but quickly recognizing the broader capabilities of LLMs.
The Llama 2 model has been embraced by cloud platforms like AWS, Google Cloud, and Microsoft Azure, with AWS announced as a managed API partner for Llama 2, thus significantly improving its accessibility.
Remarkably, the open-source community has fine-tuned and released over 7.000 derivatives of Llama on Hugging Face, with some showing performance improvements of up to 46% on benchmark datasets.
Moreover, the model has received hardware support from major companies like AMD, Intel, Nvidia, and Google, which have optimized Llama 2’s performance through their platforms
Architecture and data: Llama 2 is built on a standard Transformer-based architecture, similar to its predecessor. However, the new model offers various parameter sizes:
The pre-training dataset was significantly large, using 2 trillion tokens predominantly in English, which is an increase from the 1.4 trillion used in Llama 1.
Training methodology: The training involved pre-training, fine-tuning to make it a chat model, and incorporating a human feedback loop to refine different reward models for helpfulness and harmlessness.
The team behind Llama2 conducted both automatic and human evaluations.
However, it’s important to take into account that human evaluation can be subjective when there are many possible valuable responses to a prompt.
Llama’s first release was in February 2023. Some of the main (no-gpt) competitors in the LLM landscape included:
Google’s LaMDA/PaLM: Google’s LaMDA attracted significant attention for its LLM capabilities, despite having fewer parameters (137 billion) compared to OpenAI’s GPT-3.5, which had 175 billion parameters.
Later, Google upgraded its chatbot, Bard, to run on PaLM 2, a more advanced framework with over 340 billion parameters.
AI21’s Jurassic-1/Jurassic-2: An Israel-based startup, AI21 Labs, developed the Jurassic series of LLMs, with the second version focusing on optimized performance rather than size, and boasting a grammatical correction API and text segmentation capabilities.
Jurassic-2 was also notable for its customization capabilities, allowing users to train their own versions of the LLM with a minimal number of training examples.
Check here the Jurasic-2 documentation and see how you can train models in their platform.
Open Source Accessibility: Unlike many of its counterparts, LLaMA 2 is open source, making it freely available for research and commercial purposes.
LLaMA 2 is considered more open source than other models such as ChatGPT in the context of the AI community because Meta released the model weights. This means researchers and developers can download and use these weights to run the model on their systems.
Access to ChatGPT is provided through an API, which means users can interact with the model via a web interface or within applications, but they do not have access to the underlying code or weights to run the model independently or modify it
Llama 2-Chat: Optimized for dialogue applications using Reinforcement Learning from Human Feedback (RLHF).
Across a wide range of helpfulness and safety benchmarks, the Llama 2-Chat models perform better than most open models and achieve comparable performance to ChatGPT according to human evaluations.
You can try Llama 2-Chat in llama2.ai or in HuggingChat, you can also request access to the model in the oficial meta page.
Considering all the above, it looks like the largest “member” of the Llama 2 family is ~40–45% smaller than GPT-3.5 and ~96% smaller than GPT-4.
Does this mean Llama is the “worst” of all three models?
Not necessarily! Although one might think Llama 2 size makes it less accurate than GPT, some metrics have shown its impressive performance.
Lets take a look at meta´s benchmark:
Although Llama2 has a much lower number of parameters it has a similar performance to GPT-3.5 in many with spectacular results in tasks such as the MMLU (5–shot) and the GSM8K (8-shot).
For a more detailed statistical comparison you can check the following article or the model paper.
For those of you who want to dive deeper into the LLaMA architecture, you can take a look at the following concepts:
Powered by Design8. All rights reserved.