Tag:llama

All the articles with the tag "llama".

Understanding how LLM inference works with llama.cpp
Posted on:November 11, 2023 at 04:00 PM (34 min read)
In this post we will understand how large language models (LLMs) answer user prompts by exploring the source code of llama.cpp, a C++ implementation of LLaMA, covering subjects such as tokenization, embedding, self-attention and sampling.

Understanding how LLM inference works with llama.cpp