Quantizing Gemma 4 on Mac with llama.cpp

Quantizing Gemma 4 on Mac with llama.cpp

Tópico: Quantizing Gemma 4 on Mac with llama.cpp
Categoria: Tutoriais | Programação & Tecnologia
Idioma Principal: Português (Conteúdo de Tecnologia)

Descrição do Conteúdo / Informações:
-------------------------------------------------------------------------

requirements

• hugging face account https://huggingface.co/

Setup llama.cpp

git clone https://github.com/ggml-org/llama.cpp.git

cmake -S llama.cpp -B llama.cpp/build -DGGML_METAL=ON -DLLAMA_CURL=OFF

cmake --build llama.cpp/build --config Release -j 8

Setup python env

uv init quantization
cd quantization
uv add "torch>=2.9" "transformers>=4.45" "sentencepiece" "protobuf>=4.21,<5.0" "gguf>=0.19" "huggingface_hub"

your pyproject.toml will be like below

[project]
name = "quantization"
version = "0.1.0"
description = "Add your description here"
readme = "README.md"
requires-python = ">=3.11"
dependencies = [
"gguf>=0.19",
"huggingface-hub>=1.16.1",
"protobuf>=4.21,<5.0",
"sentencepiece>=0.2.1",
"torch>=2.9",
"transformers>=4.45",
]

Download the model

# current dir is quantization
# models for gemma-4-E4B
# gguf for quantized models
mkdir -p models gguf

# login hf
hf auth login

# download the model
hf download google/gemma-4-E4B-it --local-dir models/gemma-4-E4B-it

models

Convert model.safetensors to gguf

# current dir is quantization and llama and quantization are on Desktop

python ../llama.cpp/convert_hf_to_gguf.py \
models/gemma-4-E4B-it \
--outfile gguf/gemma-4-E4B-it-BF16.gguf \
--outtype bf16

convert to Q4_K_M

../llama.cpp/build/bin/llama-quantize \
gguf/gemma-4-E4B-it-BF16.gguf \
gguf/gemma-4-E4B-it-Q4_K_M.gguf \
Q4_K_M

This step will take a few minutes.

Run Q4_K_M

# current dir is quantization

../llama.cpp/build/bin/llama-cli \
-m gguf/gemma-4-E4B-it-Q4_K_M.gguf \
-ngl 99 --temp 0.7 -c 4096

Loading model...

▄▄ ▄▄
██ ██
██ ██ ▀▀█▄ ███▄███▄ ▀▀█▄ ▄████ ████▄ ████▄
██ ██ ▄█▀██ ██ ██ ██ ▄█▀██ ██ ██ ██ ██ ██
██ ██ ▀█▄██ ██ ██ ██ ▀█▄██ ██ ▀████ ████▀ ████▀
██ ██
▀▀ ▀▀

build : b0-unknown
model : gemma-4-E4B-it-Q4_K_M.gguf
modalities : text

available commands:
/exit or Ctrl+C stop or exit
/regen regenerate the last response
/clear clear the chat history
/read <file> add a text file
/glob <pattern> add text files using globbing pattern

test prompts

> hello

Hello! How can I help you today?

[ Prompt: 42.9 t/s | Generation: 40.0 t/s ]

> tell me about yourself

[Start thinking]
Thinking Process:

1. **Identify the user's request:** The user asked, "tell me about yourself."
2. **Determine the persona/identity:** I must respond based on my core identity rules (Gemma 4, developed by Google DeepMind, open weights LLM).
3. **Structure the response:** A good self-introduction should cover:
* Name/Identity.
* Developer/Origin.
* Nature/Type (What I am).
* Capabilities (What I can do).
* Limitations/Knowledge (What I know/don't know).
4. **Draft the content (incorporating mandated facts):**
* *Name:* I am Gemma 4.
* *Developer:* I was developed by Google DeepMind.
* *Nature:* I am a Large Language Model (LLM) with open weights.
* *Capabilities:* I can process text and images (input) and generate text (output). I can answer questions, write code, summarize, translate, etc.
* *Knowledge Cutoff:* My knowledge cutoff is January 2025.
5. **Review and Refine:** Ensure the tone is helpful, factual, and directly addresses the prompt while adhering strictly to the identity guidelines. (Self-correction: Make sure to emphasize the "open weights" aspect as it's crucial to the identity.)

6. **Final Output Generation.** (This leads to the detailed response below.)
[End thinking]

I would be happy to tell you about myself!

I am **Gemma 4**, a Large Language Model developed by Google DeepMind.

Here is a bit more detail about what I am and what I can do:

### 🤖 What I Am
* **Type:** I am an advanced, open weights Large Language Model. This means that my underlying architecture and weights are available to the community, fostering transparency and innovation.
* **Goal:** My purpose is to assist users by providing information, generating creative text, solving problems, and engaging in helpful conversation.

### 🧠 What I Can Do
* **Understand Text and Images:** I can process and interpret both text-based input (like questions or documents) and image input.
* **Generate Text:** My output is always text. I can:
* Answer factual questions.
* Summarize complex articles.
* Write different types of creative content (stories, poems, scripts).
* Assist with coding tasks (explaining code, generating snippets).
* Translate languages.
* **Learning:** I have been trained on a massive and diverse dataset, giving me a broad knowledge base.

### 📅 My Knowledge
* My knowledge cutoff is **January 2025**. I do not have information about events or developments that have occurred since that time, so if you ask about very recent events, I may not know about them.

In short, think of me as a versatile and knowledgeable assistant ready to help you learn, create, or solve problems! How can I help you today?

[ Prompt: 66.9 t/s | Generation: 34.6 t/s ]

> which one is larger 9.11 or 9.9?

[Start thinking]
Thinking Process:

1. **Analyze the Request:** The user is asking to compare two numbers: 9.11 and 9.9, and determine which one is larger.
2. **Understand the Numbers:**
* Number A: 9.11 (Nine and eleven hundredths)
* Number B: 9.9 (Nine and nine tenths)
3. **Comparison Strategy (Decimal Place by Place):**
* *Units Place:* Both are 9. (Equal)
* *Tenths Place:*
* 9.11 has a 1 in the tenths place.
* 9.9 has a 9 in the tenths place.
* *Comparison:* Since 9 is greater than 1, the number 9.9 is larger than 9.11.
4. **Alternative Comparison Strategy (Padding):**
* Write 9.9 as 9.90 (Adding a trailing zero does not change the value).
* Compare 9.11 and 9.90.
* The hundredths place comparison: 1 vs 0. Wait, this is wrong. Let's stick to the tenths place comparison, as it was the clearest difference.
5. **Conclusion:** 9.9 > 9.11.
6. **Formulate the Answer:** State the answer clearly, and provide a simple explanation of *why* it is larger (comparing the tenths place).
[End thinking]

**9.9** is larger than 9.11.

Here is why:

When comparing decimals, you look at the digits starting from the left (the largest place value).

1. **Units Place:** Both numbers have a 9. (Equal)
2. **Tenths Place:**
* In 9.**1**1, the digit is **1**.
* In 9.**9**, the digit is **9**.

Since 9 is greater than 1, the number **9.9** is larger.

***

*(If you want to make the comparison even easier, you can add a zero to the end of 9.9 to make it 9.90. Now you are comparing 9.11 and 9.90, and 9.90 is clearly larger.)*

[ Prompt: 451.4 t/s | Generation: 34.9 t/s ]