Welcome to my blog!
My name is Omri Mallis.
I have been hacking, building and designing software for 15 years. Through this blog I hope to share original insights that are not covered elsewhere.
Areas of interest: Linux, system engineering, reverse engineering, cloud architecture, database design, data infrastructure and a bit of AI.
Currently, I'm working on launching something new.
Featured
Techniques for KV Cache Optimization in Large Language Models
Posted on:February 25, 2024 at 08:00 AM (12 min read)This post explores techniques for optimizing the Key-Value (KV) cache in large language models, from Grouped-query attention to PagedAttention and distributed cache management.
Understanding how LLM inference works with llama.cpp
Posted on:November 11, 2023 at 04:00 PM (34 min read)In this post we will understand how large language models (LLMs) answer user prompts by exploring the source code of llama.cpp, a C++ implementation of LLaMA, covering subjects such as tokenization, embedding, self-attention and sampling.
IOPS, the silent killer of cloud databases
Posted on:August 20, 2023 at 08:00 AM (9 min read)Despite advancements in cloud infrastructure and storage technology, IOPS is still a significant bottleneck for cloud databases. This post explains the source of this bottleneck and techniques to solve it.
Recent Posts
Techniques for KV Cache Optimization in Large Language Models
Posted on:February 25, 2024 at 08:00 AM (12 min read)This post explores techniques for optimizing the Key-Value (KV) cache in large language models, from Grouped-query attention to PagedAttention and distributed cache management.
Understanding how LLM inference works with llama.cpp
Posted on:November 11, 2023 at 04:00 PM (34 min read)In this post we will understand how large language models (LLMs) answer user prompts by exploring the source code of llama.cpp, a C++ implementation of LLaMA, covering subjects such as tokenization, embedding, self-attention and sampling.
IOPS, the silent killer of cloud databases
Posted on:August 20, 2023 at 08:00 AM (9 min read)Despite advancements in cloud infrastructure and storage technology, IOPS is still a significant bottleneck for cloud databases. This post explains the source of this bottleneck and techniques to solve it.