Meet LLama 2: Meta’s linguistic titan

Latest posts

Community Detection Algorithms

Beyond BLUE Score

How AI Is Transforming Insurance Fraud Detection

Gentle introduction to State Space Models

Llama 2 emerges as a formidable contender in the realm of language models, crafted by Meta AI with an ethos of open-source collaboration.

Building on the foundation laid by its predecessor Llama 1, Llama 2 was introduced only a few months after the first iteration, showcasing a remarkable pace of development and adoption.

The development of Llama 2 was rooted in a fast-moving research project within Meta’s FAIR team, originally focused on formal mathematics but quickly recognizing the broader capabilities of LLMs.

Impact on the industry

The Llama 2 model has been embraced by cloud platforms like AWS, Google Cloud, and Microsoft Azure, with AWS announced as a managed API partner for Llama 2, thus significantly improving its accessibility.

Remarkably, the open-source community has fine-tuned and released over 7.000 derivatives of Llama on Hugging Face, with some showing performance improvements of up to 46% on benchmark datasets.

Moreover, the model has received hardware support from major companies like AMD, Intel, Nvidia, and Google, which have optimized Llama 2’s performance through their platforms
https://ai.meta.com/blog/llama-2-updates-connect-2023/

Properties of LLama 2

Architecture and data: Llama 2 is built on a standard Transformer-based architecture, similar to its predecessor. However, the new model offers various parameter sizes:

7 billion
13 billion
70 billion

The pre-training dataset was significantly large, using 2 trillion tokens predominantly in English, which is an increase from the 1.4 trillion used in Llama 1.

Training methodology: The training involved pre-training, fine-tuning to make it a chat model, and incorporating a human feedback loop to refine different reward models for helpfulness and harmlessness.

The team behind Llama2 conducted both automatic and human evaluations.

4.000 different prompts evaluated for helpfulness.
2.000 prompts to evaluate harmlessness.

However, it’s important to take into account that human evaluation can be subjective when there are many possible valuable responses to a prompt.

Background

Llama’s first release was in February 2023. Some of the main (no-gpt) competitors in the LLM landscape included:

Google’s LaMDA/PaLM: Google’s LaMDA attracted significant attention for its LLM capabilities, despite having fewer parameters (137 billion) compared to OpenAI’s GPT-3.5, which had 175 billion parameters.

Later, Google upgraded its chatbot, Bard, to run on PaLM 2, a more advanced framework with over 340 billion parameters.

Check here Google AI PaLM 2 documentation.

AI21’s Jurassic-1/Jurassic-2: An Israel-based startup, AI21 Labs, developed the Jurassic series of LLMs, with the second version focusing on optimized performance rather than size, and boasting a grammatical correction API and text segmentation capabilities.

Jurassic-2 was also notable for its customization capabilities, allowing users to train their own versions of the LLM with a minimal number of training examples.

Check here the Jurasic-2 documentation and see how you can train models in their platform.

What makes LLama different from other models?

Open Source Accessibility: Unlike many of its counterparts, LLaMA 2 is open source, making it freely available for research and commercial purposes.

LLaMA 2 is considered more open source than other models such as ChatGPT in the context of the AI community because Meta released the model weights. This means researchers and developers can download and use these weights to run the model on their systems.

Access to ChatGPT is provided through an API, which means users can interact with the model via a web interface or within applications, but they do not have access to the underlying code or weights to run the model independently or modify it

Llama 2-Chat: Optimized for dialogue applications using Reinforcement Learning from Human Feedback (RLHF).

Across a wide range of helpfulness and safety benchmarks, the Llama 2-Chat models perform better than most open models and achieve comparable performance to ChatGPT according to human evaluations.

You can try Llama 2-Chat in llama2.ai or in HuggingChat, you can also request access to the model in the oficial meta page.

Llama 2 vs ChatGPT

Considering all the above, it looks like the largest “member” of the Llama 2 family is ~40–45% smaller than GPT-3.5 and ~96% smaller than GPT-4.

Does this mean Llama is the “worst” of all three models?

Not necessarily! Although one might think Llama 2 size makes it less accurate than GPT, some metrics have shown its impressive performance.

Lets take a look at meta´s benchmark:

Although Llama2 has a much lower number of parameters it has a similar performance to GPT-3.5 in many with spectacular results in tasks such as the MMLU (5–shot) and the GSM8K (8-shot).

For a more detailed statistical comparison you can check the following article or the model paper.

Secrets inside LLaMA 2 Architecture

For those of you who want to dive deeper into the LLaMA architecture, you can take a look at the following concepts:

Pre-Normalization: LLaMA models feature pre-normalization similar to GPT-3, which helps with training stability. Rather than normalizing the outputs of each transformer sub-layer, LLaMA normalizes the inputs using the RMS Norm function.
Activation Function: The LLaMA models use the SwiGLU activation function instead of the more traditional ReLU function, which has been found to improve training performance significantly.
Positional Encodings: LLaMA utilizes rotary positional embeddings (RoPE) at each layer, which is a technique borrowed from the GPT-Neo-X project.
Architectural Modifications: The LLaMA 2 model specifically introduced a longer context window (doubled from 2048 to 4096 tokens) and a grouped-query attention mechanism to handle the scale of contextual data more effectively