llama.cpp

GGUF
GGUF
	File:GGML logo.svg
Filename extension	.gguf
Internet media type	{{#property:P1163}}
Magic number	0x47 0x47 0x55 0x46
Developed by	Georgi Gerganov and community
Initial release	August 22, 2023; 2 years ago
Latest release	v3
Type of format	Machine-learning tensors

llama.cpp
llama.cpp
Original author	Georgi Gerganov
Developers	Georgi Gerganov and community
Initial release	March 10, 2023; 3 years ago
Repository	github.com/ggml-org/llama.cpp
Written in	C++, C
Engine	Lua error in Module:EditAtWikidata at line 29: attempt to index field 'wikibase' (a nil value).
Type	Library for large language models
License	MIT License

llama.cpp is an open source software library that performs inference on various large language models such as Llama.^[3] It is co-developed alongside the GGML project, a general-purpose tensor library.^[4]

Command-line tools are included with the library,^[5] alongside a server with a simple web interface.^[6]^[7]

Background

Towards the end of September 2022, Georgi Gerganov started work on the GGML library, a C library implementing tensor algebra. Gerganov developed the library with the intention of strict memory management and multi-threading. The creation of GGML was inspired by Fabrice Bellard's work on LibNC.^[8]

Before llama.cpp, Gerganov worked on a similar library called whisper.cpp which implemented Whisper, a speech to text model by OpenAI.^[9]

Development

llama.cpp began development in March 2023 by Georgi Gerganov as an implementation of the Llama inference code in pure C/C++ with no dependencies. This improved performance on computers without GPU or other dedicated hardware, which was a goal of the project.^[3]^[10]^[11] llama.cpp gained traction with users who lacked specialized hardware, as it could run on just a CPU.

While initially designed for CPUs, GPU and NPU backend support was later added.^[12] As of August 2025 it has more than 85,000 stars on GitHub.^[13]

On Apr 30, 2024, FlashAttention was introduced.

On Apr 10, 2025, libmtmd was introduced, which reinvigorated support for multimodal models that has been stagnant previously.

On Dec 17, 2025, full acceleration on Android and ChromeOS devices was introduced via a new GUI binding^[14], which unlocks native app development beyond the previous approach of cross-compiling and running CLI ^[10]^[15]^[16] in an adb shell.

Architecture

llama.cpp supports multiple hardware targets, including x86, ARM, Metal, BLAS, BLIS, SYCL, MUSA, CUDA, HIP, CANN, OpenCL, RPC and Vulkan (version 1.2 or greater).^[17]^[18]^[19]^[20] These back-ends make up the GGML tensor library which is used by the front-end model-specific llama.cpp code.^[21] llama.cpp makes use of several CPU extensions for optimization:

AVX, AVX2, AVX-512, AVX-VNNI and AMX for X86-64.
Neon, i8MM, SVE, SVE2, SME and SME2 for AArch64 (ARM64).
Apple silicon is an important target for the project.^[13]^[22]

llama.cpp supports a variety of features aimed at inference on edge devices, such as:

Ahead of time model quantization and on-the-fly kv-cache quantization.^[23]
Speculative decoding.^[7]
Partial offloading of model layers to system RAM, allowing devices to load models that would be too large to fit solely in GPU VRAM.

In addition, llama.cpp supports a variety of features and APIs for frontend communication, such as:

OpenAI-compatible endpoints like v1/chat/completions.
Grammar-based output formatting as JSON.^[11]

GGUF file format

The GGUF (GGML Universal File)^[26] file format is a binary format that stores both tensors and metadata in a single file, and is designed for fast saving, and loading of model data.^[27] It was introduced in August 2023 by the llama.cpp project to better maintain backwards compatibility as support was added for other model architectures.^[12]^[28] It superseded previous formats used by the project such as GGML.

GGUF files are typically created by converting models developed with a different machine learning library such as PyTorch.^[27]

Design

GGUF focuses on quantization, the act of reducing precision in the model weights. This can lead to reduced memory usage and increased speed, albeit at the cost of reduced model accuracy.^[29]^[28]

GGUF supports 2-bit to 8-bit quantized integer types,^[30] common floating-point data formats such as float32, float16, and bfloat16, and 1.58 bit quantization.^[5]

GGUF contains information necessary for running a GPT-like language model such as the tokenizer vocabulary, context length, tensor info and other attributes.^[31]

Byte-level structure (little-endian)

Bytes	Description^[32]
4	GGUF magic number, currently set to `0x47 0x47 0x55 0x46`
4	GGUF version, currently set to `3`
8	`UINT64 tensor_count`: number of tensors
8	`UINT64 metadata_kv_count`: number of metadata key-value pairs
Variable	Metadata block, containing metadata_kv_count key-value pairs
Variable	Tensors info block, containing tensor_count values
Variable	`uint8_t tensor_data[]`, weight bits block

Metadata block

// example metadata
general.architecture:  'llama',
general.name:          'LLaMA v2',
llama.context_length:  4096,
... ,
general.file_type:     10, // (typically indicates quantization level, here "MOSTLY_Q2_K")
tokenizer.ggml.model: 'llama',
tokenizer.ggml.tokens: [
   '<unk>', '<s>', '</s>', '<0x00>', '<0x01>', '<0x02>',
   '<0x03>', '<0x04>', '<0x05>', '<0x06>', '<0x07>', '<0x08>',
   ...
],
...

Tensors info block

// n-th tensor
name:         GGUF string, // ex: "blk.0.ffn_gate.weight"
n_dimensions: UINT32,      // ex: 2
dimensions:   UINT64[],    // ex: [ 4096, 32000 ]
type:         UINT32,      // ex: 10 (typically indicates quantization level, here "GGML_TYPE_Q2_K")
offset:       UINT64       // starting position within the tensor_data block, relative to the start of the block
// (n+1)-th tensor
...

References

^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ ^a ^b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ ^a ^b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ ^a ^b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ ^a ^b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ ^a ^b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ ^a ^b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ ^a ^b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ ^a ^b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ ^a ^b Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).
^ Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[githubrelease-1] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[license-2] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[register-llamafile-3] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[ggml-4] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[theregister_14_Jul_2024-5] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[lwn-6] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[theregister_15_December_2024-7] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[changelog-podcast-mar-2023-8] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[whisper-9] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[arstechnica-10] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[Wiest-11] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[Rajput-12] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[llama.cpprepo-13] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[14] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[mozilla-introducing-llamafile-15] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[16] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[Gerganov_Slaren_Nguyen_Introduction_to_ggml-17] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[Kluska-18] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[Run_LLMs_on_Intel_GPUs_Using_llama.cpp-19] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[Bolz-20] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[tomshardware-21] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[phoronix-llamafile-22] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[Walkowiak-23] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[githubgguf-24] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[ggufdoc-25] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[gguf-py-26] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[huggingface-27] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[ibm-gguf-vs-ggml-28] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[towardsdatascience-29] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[Cabezas-30] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[Accelerating_GGUF_Models_with_Transformers-31] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[gguf.md-32] Lua error in Module:Citation/CS1/Configuration at line 2172: attempt to index field '?' (a nil value).

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]

[23]

[24]

[25]

[26]

[27]

[28]

[29]

[30]

[31]

[32]

llama.cpp

Contents

Background

Development

Architecture

GGUF file format

Design

Byte-level structure (little-endian)

Metadata block

Tensors info block

References

Navigation menu

GGUF
File:GGML logo.svg
Filename extension	`.gguf`
Internet media type	{{#property:P1163}}
Magic number	`0x47` `0x47` `0x55` `0x46`
Developed by	Georgi Gerganov and community
Initial release	August 22, 2023; 2 years ago (2023-08-22)^[24]
Latest release	v3^[25]
Type of format	Machine-learning tensors

llama.cpp

Background

Development

Architecture

GGUF file format

Design

Byte-level structure (little-endian)

Metadata block

Tensors info block

References

Navigation menu

Search