本文记录双 RTX4080 显卡运行 Yi-34B-Chat-4bits 模型过程与排错。
基于 Ubuntu Server 20.04.6 LTS,并具有基础的科学上网条件。
由于 CUDA 已绑定对应显卡驱动,因此直接到官网按提示下载安装即可,譬如我选择安装 12.1 版本:
https://developer.nvidia.com/cuda-12-1-0-download-archive
wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/cuda-ubuntu2004.pin
sudo mv cuda-ubuntu2004.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.1.0/local_installers/cuda-repo-ubuntu2004-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo dpkg -i cuda-repo-ubuntu2004-12-1-local_12.1.0-530.30.02-1_amd64.deb
sudo cp /var/cuda-repo-ubuntu2004-12-1-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda
安装 Docker
for pkg in docker.io docker-doc docker-compose docker-compose-v2 podman-docker containerd runc; do sudo apt-get remove $pkg; done
# Add Docker's official GPG key:
sudo apt-get update
sudo apt-get install ca-certificates curl gnupg
sudo install -m 0755 -d /etc/apt/keyrings
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /etc/apt/keyrings/docker.gpg
sudo chmod a+r /etc/apt/keyrings/docker.gpg
# Add the repository to Apt sources:
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/etc/apt/keyrings/docker.gpg] https://download.docker.com/linux/ubuntu \
$(. /etc/os-release && echo "$VERSION_CODENAME") stable" | \
sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
sudo apt-get update
sudo apt-get install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin
如果上面有出错信息,则大概率是访问 docker 服务器间的网络不顺畅。。。
安装 NVIDIA Container Toolkit
curl -fsSL https://nvidia.github.io/libnvidia-container/gpgkey | sudo gpg --dearmor -o /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg \
&& curl -s -L https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | \
sed 's#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g' | \
sudo tee /etc/apt/sources.list.d/nvidia-container-toolkit.list \
&& \
sudo apt-get update
sudo apt-get install -y nvidia-container-toolkit
sudo nvidia-ctk runtime configure --runtime=docker
sudo systemctl restart docker
Dockerfile 与 compose.yaml
此 Dockerfile 同时加入 chatglm-cpp 的支持(貌似默认无法支持多卡,需要修改代码?)
FROM nvidia/cuda:12.1.1-cudnn8-devel-ubuntu20.04
ARG TIMEZONE=Asia/Shanghai
ARG UID=1000
ARG GID=1000
ENV DEBIAN_FRONTEND=noninteractive
RUN \
sed -e "s/archive.ubuntu.com/mirrors.tuna.tsinghua.edu.cn/g" \
-e "s/security.ubuntu.com/mirrors.tuna.tsinghua.edu.cn/g" -i /etc/apt/sources.list
# Set timezone
RUN ln -snf /usr/share/zoneinfo/${TIMEZONE} /etc/localtime && echo ${TIMEZONE} > /etc/timezone \
&& "date"
RUN apt-get update && apt-get install -y \
curl \
apt-utils \
wget \
build-essential \
libssl-dev \
zlib1g-dev \
libncurses5-dev \
libncursesw5-dev \
libreadline-dev \
libsqlite3-dev \
libgdbm-dev \
libdb5.3-dev \
libbz2-dev \
libexpat1-dev \
liblzma-dev \
git \
vim \
libffi-dev \
libgdbm-compat-dev \
ffmpeg libsm6 libxext6
RUN wget https://www.python.org/ftp/python/3.10.12/Python-3.10.12.tar.xz -O /tmp/Python-3.10.12.tar.xz \
&& tar xvf /tmp/Python-3.10.12.tar.xz -C /tmp \
&& cd /tmp/Python-3.10.12 \
&& ./configure --enable-optimizations --with-ensurepip=install \
&& make -j $(nproc) \
&& make altinstall \
&& update-alternatives --install /usr/bin/python python /usr/local/bin/python3.10 1 \
&& update-alternatives --install /usr/bin/pip pip /usr/local/bin/pip3.10 1 \
&& rm -rf /tmp/Python-3.10.12.tar.xz /tmp/Python-3.10.12
RUN export LD_LIBRARY_PATH=/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
RUN pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu121
RUN pip install xformers --index-url https://download.pytorch.org/whl/cu118
RUN pip install wheel streamlit transformers_stream_generator cpm_kernels astunparse accelerate tiktoken einops scipy peft auto-gptq optimum
RUN CMAKE_ARGS="-DGGML_CUBLAS=ON" pip install -U 'chatglm-cpp[api]'
RUN pip install autoawq xformers
RUN groupadd -g $GID -o chatglm
RUN useradd -m -u $UID -g $GID -o -s /bin/bash chatglm
USER chatglm
WORKDIR /home/chatglm/work
ENTRYPOINT ["tail", "-f", "/dev/null"]
compose.yaml
services:
chatglm:
build:
context: ./docker
network: host
container_name: chatglm
restart: always
# entrypoint: ["uvicorn", "chatglm_cpp.openai_api:app", "--host", "0.0.0.0"]
dns:
- 119.29.29.29
volumes:
- ./:/home/chatglm/work/
- ./data/cache:/home/chatglm/.cache
- ./data/config:/home/chatglm/.config
- ./data/triton:/home/chatglm/.triton
- /usr/local/nvidia:/usr/local/nvidia
ports:
- 8000:8000
deploy:
resources:
reservations:
devices:
- driver: nvidia
capabilities: [gpu]
docker compose up -d –build , 运行,devel镜像较大,可能需要较长时间。
运行后先进入容器查看是否显卡已正常挂载,nvidia-smi 。
git clone https://huggingface.co/01-ai/Yi-34B-Chat-4bits 下载模型,需要先安装 lfs 插件。
进入容器,按官方资料编写并执行脚本,由于 AutoModelForCausalLM 已支持多 GPU,通常不需要更改代码。
错误排查
对于多 GPU 的环境,需要先保证 GPU 间数据能正常通讯:
import torch
x = torch.tensor([1.0, 2.0], device=0)
x.to(1)
x.to('cpu').to(1)
需要测试 x.to(1) 后的值是否正常,如果为 [0, 0] 则代表 GPU 间数据传递失败,可能存在两种原因:
-
IOMMU,可以通过 bios 配置关闭,参考 https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/troubleshooting.html#pci-access-control-services-acs
-
显卡驱动问题,建议寻找合适的驱动版本。譬如 CUDA 12.1 对应的驱动版本无法 GPU 间传递数据,升级到 12.2 后正常(容器依然基于 12.1,经测试依然可用)