llama.cpp?
llama.cpp is a open-source library for LLM inference on various environments

- It supports most LLM models with various quantizations by GGUF
- It can work with or without GPU
- It has little dependency
- It supports several optimized libraries, such as BLAS, RPC(Remote Procedure Call), KleidiAi.
llama.cpp installation
I used docker. The following Dockerfile is used for building docker image.
FROM armv64v8/python:3.12-slim AS builder
COPY --from=docker.io/astral/uv:latest /uv /uvx /bin/
RUN apt-get update && apt-get install -y --no-install-recommends \
git \
build-essential \
cmake \
libopenblas-dev \
pkg-config \
curl \
libcurl4-openssl-dev \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
RUN git clone https://github.com/ggml-org/llama.cpp.git .
RUN cmake -B build -DGGML_RPC=ON
RUN cmake --build build --config Release
Build and run llama.cpp with docker is very simple.
# Build
docker build -t llama-cpp .
# Run (interactive shell)
docker run --rm -it -v /llama.cpp-models:/models --entrypoint /bin/bash llama-cpp
# Run (llama-cli for local inference)
docker run -v /llama.cpp-models:/models llama-cpp /app/build/bin/llama-cli -m /models/Qwen3-0.6B-Q8_0.gguf -p "hello"
# Run (llama-cli with RPC servers)
docker run -v /llama.cpp-models:/models llama-cpp /app/build/bin/llama-cli --rpc 192.168.1.51:50052,192.168.1.52:50052,192.168.1.53:50052,192.168.1.54:50052 -m /models/Qwen3-8B-Q8_0.gguf -p "hello"
Have FUN !!!

'Programming > RaspberryPi' 카테고리의 다른 글
| RaspberryPi 로 Pokémon Another Red (RPG Maker XP) 게임 돌리기 (0) | 2025.08.24 |
|---|---|
| RaspberryPi Clustering Setup part2 (0) | 2025.08.12 |
| RaspberryPi Clustering Setup part1 (0) | 2025.08.12 |
| Install chrome-driver for RaspberryPI (0) | 2025.02.14 |
| Install Nextcloud on RaspberryPi (0) | 2023.03.15 |

