采用 podman 或 docker 打包 Ollama 模型服务

目录:

预计阅读时间：3 分钟

采用 ollama封装服务，独立作为一个服务来支持完成 NLP 相关的模型推理/编码等问题。

依赖基础镜像：debian, 详见[[基础镜像]]，根据系统环境下载不同架构的基础镜像。

# macOS 操作系统
podman load -i debian.v12.arm.tar

检验是否加载成功，命令如下所示：

[ 10:24上午 ]  [ yu@yu:~/Downloads ]
$ podman images | grep debian
localhost/debian          latest      5852ca45f7bf  3 years ago  123 MB

启动容器，命令如下：

$ podman run -it localhost/debian:latest bash
root@3b3a5f01ea62:/#

采用 cursor 远程链接容器，进行开发。并对 debian服务器进行国内源配置。

mv /etc/apt/sources.list /etc/apt/sources.list.bak

tee /etc/apt/sources.list <<-"EOF"
# 默认注释了源码镜像以提高 apt update 速度，如有需要可自行取消注释
deb http://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm main contrib
deb-src http://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm main contrib

deb http://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-updates main contrib
deb-src http://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-updates main contrib

deb http://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-backports main contrib
deb-src http://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-backports main contrib

# 以下安全更新软件源包含了官方源与镜像站配置，如有需要可自行修改注释切换
deb https://security.debian.org/debian-security bookworm-security main contrib
deb-src https://security.debian.org/debian-security bookworm-security main contrib

EOF

安装 ollama, 其命令如下所示。

# 更新
apt update 
apt install curl -y

# 系统瘦身
apt autoremove

# 安装 ollama, 网络较慢，有时需要多次重试
curl -fsSL https://ollama.com/install.sh | sh

或者拆解成两部分, 其命令如下所示。

apt install wget -y
wget https://ollama.com/install.sh
bash install.sh

终端安装进程

root@3b3a5f01ea62:~# bash install.sh 
>>> Cleaning up old version at /usr/local/lib/ollama
>>> Installing ollama to /usr/local
>>> Downloading Linux arm64 bundle

查看是否安装成功

root@3b3a5f01ea62:~# ollama --version
Warning: could not connect to a running Ollama instance
Warning: client version is 0.9.6

启动服务

root@3b3a5f01ea62:~# ollama serve
Couldn't find '/root/.ollama/id_ed25519'. Generating new private key.
Your new public key is:

下面 qwen3:xx 模型

ollama run qwen3:1.7b

模型下载成功后，其路径在 ~/.ollama 路径下。

# 对模型进行打包
tar -zcvf qwen3.version.1.7b.tar ~/.ollama/
# 或者
ollama export qwen3:1.7b qwen3.version.1.7b.tar

搭建服务 Dockerfile文件

FROM debian:latest

RUN mv /etc/apt/sources.list /etc/apt/sources.list.bak

RUN tee /etc/apt/sources.list <<-"EOF"
# 默认注释了源码镜像以提高 apt update 速度，如有需要可自行取消注释
deb http://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm main contrib
deb-src http://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm main contrib

deb http://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-updates main contrib
deb-src http://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-updates main contrib

deb http://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-backports main contrib
deb-src http://mirrors.tuna.tsinghua.edu.cn/debian/ bookworm-backports main contrib

# 以下安全更新软件源包含了官方源与镜像站配置，如有需要可自行修改注释切换
deb https://security.debian.org/debian-security bookworm-security main contrib
deb-src https://security.debian.org/debian-security bookworm-security main contrib
EOF

# 更新
RUN apt update 
RUN apt install curl -y

# 安装ollama
RUN curl -fsSL https://ollama.com/install.sh | sh

# 模型集成到容器中
ADD ./.ollama /root/.ollama

EXPOSE 11434
CMD ["ollama", "serve" ]

docker_build_arm.sh文件

tee docker_build_arm.sh <<-"EOF"
# !/bin/sh
docker build -t debian/ollama-qwen3-1.7b-arm:v1 .
EOF

docker_run_arm.sh文件：

tee docker_run_arm.sh <<-"EOF"
#!/bin/bash
# dev
podman run -p 8000:11434 \
    --name qwen3 \
    --restart=always \
    -e OLLAMA_NUM_PARALLEL=4 \
    -e OLLAMA_MAX_LOADED_MODELS=2 \
    -e OLLAMA_KEEP_ALIVE="10m" \
    -e OLLAMA_MODELS=/root/.ollama \
    -e OLLAMA_MAX_QUEUE=10 \
    -e OLLAMA_HOST="0.0.0.0:11434" \
    debian/ollama-qwen3-1.7b-arm:v1

# pro
# podman run -d -p 8000:11434 \
#     --name qwen3 \
#     --restart=always \
#-e CUDA_VISIBLE_DEVICES=0 \
#     -e OLLAMA_NUM_PARALLEL=4 \
#     -e OLLAMA_MAX_LOADED_MODELS=2 \
#     -e OLLAMA_KEEP_ALIVE="10m" \
#     -e OLLAMA_MODELS=/root/.ollama \
#     -e OLLAMA_MAX_QUEUE=10 \
#     -e OLLAMA_HOST="0.0.0.0:11434" \
#     debian/ollama-qwen3-1.7b-arm:v1

EOF

运行服务

# 启动服务
bash docker_run_arm.sh

采用 podman logs <container_id>

$ podman logs c370031549ae            
time=2025-07-25T00:50:39.044Z level=INFO source=routes.go:1235 msg="server config" env="map[CUDA_VISIBLE_DEVICES: 
...
time=2025-07-25T00:50:39.050Z level=INFO source=types.go:130 msg="inference compute" id=0 library=cpu variant="" compute="" driver=0.0 name="" total="1.9 GiB" available="1.6 GiB"

打开浏览器：

# 输入以下地址
http://127.0.0.1:8000

看到 Ollama is running 表示服务启动成功。

本文由 admin 原创，转载请注明出处。若存在侵权请联系删除。

智数

AI时代以数字化为基础、以智能化为方向,而“智数”可视为对这一关系的逆向强调——突出智能对数据价值的激活作用。

采用 podman 或 docker 打包 Ollama 模型服务

您还没有登录，请您登录后发表评论。