docker中使用GPU+rocksdb的詳細(xì)教程
配置環(huán)境
dell@dell-Precision-3630-Tower ~ lsb_release -a No LSB modules are available. Distributor ID: Ubuntu Description: Ubuntu 20.04.6 LTS Release: 20.04 Codename: focal dell@dell-Precision-3630-Tower ~ nvcc --version nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2022 NVIDIA Corporation Built on Wed_Sep_21_10:33:58_PDT_2022 Cuda compilation tools, release 11.8, V11.8.89 Build cuda_11.8.r11.8/compiler.31833905_0 dell@dell-Precision-3630-Tower ~ docker version Client: Docker Engine - Community Version: 24.0.6 API version: 1.43 Go version: go1.20.7 OS/Arch: linux/amd64 Context: default Server: Docker Engine - Community Engine: Version: 24.0.6 API version: 1.43 (minimum version 1.12) Go version: go1.20.7 OS/Arch: linux/amd64 Experimental: false containerd: Version: 1.6.24 runc: Version: 1.1.9 docker-init: Version: 0.19.0 #安裝方式:sudo apt-get install libcudnn8-dev=8.9.2.26-1+cuda11.8 cudnn:libcudnn8-dev=8.9.2.26-1+cuda11.8
目錄結(jié)構(gòu)
nvidia-docker和從docker 19開始提供的nvidia-container-toolkit的區(qū)別:
nvidia-docker 概述:
nvidia-docker
是最初用于在 Docker 容器中提供 GPU 支持的工具。- 命令:
nvidia-docker
具有自己的命令行工具,并且最初被設(shè)計為docker
命令的替代品。你可以用nvidia-docker run
來啟動一個使用 GPU 的容器。 - 插件:
nvidia-docker
版本 1 和 2 都使用了 Docker 插件系統(tǒng)。版本 2 是 Docker 插件的一種形式,允許用戶使用--runtime=nvidia
標(biāo)志與標(biāo)準(zhǔn)docker
命令一起使用。
nvidia-container-toolkit
- 概述:在 Docker 19.03 版本之后,Docker 引入了一個名為 GPU 的設(shè)備請求特性。
nvidia-container-toolkit
是一個新的工具,允許用戶使用這個新特性,而不再需要nvidia-docker
的自定義運行時。 - 命令:與使用
nvidia-docker
不同,使用nvidia-container-toolkit
,你可以使用常規(guī)的docker
命令,但是添加一個--gpus
參數(shù)來啟用 GPU 支持。例如:docker run --gpus all nvidia/cuda:10.0-base nvidia-smi
。 - 集成:它更緊密地集成到 Docker CLI 中,允許更好的兼容性和使用體驗。
比較和推薦使用
nvidia-docker
版本 1 已經(jīng)棄用,而版本 2 在某些用例中仍然被使用,但逐漸被nvidia-container-toolkit
替代。- 對于 Docker 19.03 及更高版本,官方推薦使用
nvidia-container-toolkit
,因為它提供了一個更簡潔和標(biāo)準(zhǔn)的方式來在容器中使用 GPU。 - 使用
nvidia-container-toolkit
允許開發(fā)者和運維團隊在不更改工作流的情況下,簡單地將 GPU 支持添加到他們現(xiàn)有的 Docker 容器中。 - 盡管在一些老的代碼和項目中你仍然可能會看到
nvidia-docker
的使用,但新的項目和開發(fā)通常應(yīng)該使用nvidia-container-toolkit
,除非有明確的理由不這樣做。
docker安裝GPU工具箱nvidia-container-toolkit
參考鏈接:
https://zhuanlan.zhihu.com/p/544713249
sudo apt install curl distribution=$(. /etc/os-release;echo $ID$VERSION_ID) curl -s -L https://nvidia.github.io/nvidia-docker/gpgkey | sudo apt-key add - curl -s -L https://nvidia.github.io/nvidia-docker/$distribution/nvidia-docker.list | sudo tee /etc/apt/sources.list.d/nvidia-docker.list sudo apt-get update && sudo apt-get install -y nvidia-container-toolkit sudo systemctl restart docker
docker拉取含cuda的鏡像建立鏡像
去Nvidia官網(wǎng)下載cuda版本的Docker:https://hub.docker.com/r/nvidia/cuda
images包含的三種風(fēng)格:
base
: Includes the CUDA runtime (cudart)runtime
: Builds on thebase
and includes the CUDA math libraries, and NCCL. Aruntime
image that also includes cuDNN is available.devel
: Builds on theruntime
and includes headers, development tools for building CUDA images. These images are particularly useful for multi-stage builds.
NVIDIA Container Toolkit
The NVIDIA Container Toolkit for Docker is required to run CUDA images.
For CUDA 10.0, nvidia-docker2
(v2.1.0) or greater is recommended. It is also recommended to use Docker 19.03.
還是自己寫一個鏡像吧,該鏡像擁有cudn,rocksdb環(huán)境
# from official ubuntu 20.04 # FROM ubuntu:20.04 # docker pull nvidia/cuda:11.8.0-devel-ubuntu20.04 FROM nvidia/cuda:11.8.0-devel-ubuntu20.04 # RUN mv /etc/apt/sources.list /etc/apt/sources_backup.list && \ # echo "deb http://mirrors.ustc.edu.cn/ubuntu/ focal main restricted " >> /etc/apt/sources.list && \ # echo "deb http://mirrors.ustc.edu.cn/ubuntu/ focal-updates main restricted " >> /etc/apt/sources.list && \ # echo "deb http://mirrors.ustc.edu.cn/ubuntu/ focal universe " >> /etc/apt/sources.list && \ # echo "deb http://mirrors.ustc.edu.cn/ubuntu/ focal-updates universe " >> /etc/apt/sources.list && \ # echo "deb http://mirrors.ustc.edu.cn/ubuntu/ focal multiverse " >> /etc/apt/sources.list && \ # echo "deb http://mirrors.ustc.edu.cn/ubuntu/ focal-updates multiverse " >> /etc/apt/sources.list && \ # echo "deb http://mirrors.ustc.edu.cn/ubuntu/ focal-backports main restricted universe multiverse " >> /etc/apt/sources.list && \ # echo "deb http://mirrors.ustc.edu.cn/ubuntu/ focal-security main restricted " >> /etc/apt/sources.list && \ # echo "deb http://mirrors.ustc.edu.cn/ubuntu/ focal-security universe " >> /etc/apt/sources.list && \ # echo "deb http://mirrors.ustc.edu.cn/ubuntu/ focal-security multiverse " >> /etc/apt/sources.list && \ # echo "deb http://archive.canonical.com/ubuntu focal partner " >> /etc/apt/sources.list # update system RUN ln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime && echo 'Asia/Shanghai' >/etc/timezone \ && apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2004/x86_64/3bf863cc.pub \ && apt clean && apt update && apt install -yq --no-install-recommends sudo \ && sudo apt install -yq --no-install-recommends python3 python3-pip libgl1-mesa-glx libglib2.0-0 libsm6 libxext6 libxrender-dev openssh-server \ && sudo pip3 install --upgrade pip \ && sudo pip3 config set global.index-url https://mirrors.aliyun.com/pypi/simple \ && sudo pip3 install setuptools RUN apt-get update && apt-get upgrade -y # install basic tools RUN apt-get install -y vim wget curl # install tzdata noninteractive RUN DEBIAN_FRONTEND=noninteractive TZ=Etc/UTC apt-get -y install tzdata # install git and default compilers RUN apt-get install -y git gcc g++ clang clang-tools # install basic package RUN apt-get install -y lsb-release software-properties-common gnupg # install gflags, tbb RUN apt-get install -y libgflags-dev libtbb-dev # install compression libs RUN apt-get install -y libsnappy-dev zlib1g-dev libbz2-dev liblz4-dev libzstd-dev # install cmake RUN apt-get install -y cmake RUN apt-get install -y libssl-dev # install clang-13 WORKDIR /root RUN wget https://apt.llvm.org/llvm.sh RUN chmod +x llvm.sh RUN ./llvm.sh 13 all # install gcc-7, 8, 10, 11, default is 9 RUN apt-get install -y gcc-7 g++-7 RUN apt-get install -y gcc-8 g++-8 RUN apt-get install -y gcc-10 g++-10 RUN echo "deb https://ppa.launchpadcontent.net/ubuntu-toolchain-r/test/ubuntu focal main" |tee -a /etc/apt/sources.list RUN echo "deb-src https://ppa.launchpadcontent.net/ubuntu-toolchain-r/test/ubuntu focal main" |tee -a /etc/apt/sources.list RUN curl -sL "http://keyserver.ubuntu.com/pks/lookup?op=get&search=0x60C317803A41BA51845E371A1E9377A2BA9EF27F" |apt-key add #RUN apt-key adv --keyserver keyserver.ubuntu.com --recv-keys 60C317803A41BA51845E371A1E9377A2BA9EF27F RUN add-apt-repository -y ppa:ubuntu-toolchain-r/test RUN apt-get update && apt-get upgrade -y #RUN apt-get install -y gcc-11 g++-11 # install apt-get install -y valgrind RUN apt-get install -y valgrind # install folly depencencies RUN apt-get install -y libgoogle-glog-dev # install openjdk 8 RUN apt-get install -y openjdk-8-jdk ENV JAVA_HOME /usr/lib/jvm/java-1.8.0-openjdk-amd64 # install mingw RUN apt-get install -y mingw-w64 # install gtest-parallel package RUN git clone --single-branch --branch master --depth 1 https://github.com/google/gtest-parallel.git ~/gtest-parallel ENV PATH $PATH:/root/gtest-parallel # install libprotobuf for fuzzers test RUN apt-get install -y ninja-build binutils liblzma-dev libz-dev pkg-config autoconf libtool #解決GnuTLS recv error RUN apt-get update RUN apt-get upgrade RUN apt-get install --reinstall ca-certificates RUN git clone --branch v1.0 https://github.com/google/libprotobuf-mutator.git ~/libprotobuf-mutator && cd ~/libprotobuf-mutator && git checkout ffd86a32874e5c08a143019aad1aaf0907294c9f && mkdir build && cd build && cmake .. -GNinja -DCMAKE_C_COMPILER=clang-13 -DCMAKE_CXX_COMPILER=clang++-13 -DCMAKE_BUILD_TYPE=Release -DLIB_PROTO_MUTATOR_DOWNLOAD_PROTOBUF=ON && ninja && ninja install ENV PKG_CONFIG_PATH /usr/local/OFF/:/root/libprotobuf-mutator/build/external.protobuf/lib/pkgconfig/ ENV PROTOC_BIN /root/libprotobuf-mutator/build/external.protobuf/bin/protoc #install the latest google benchmark RUN git clone --depth 1 --branch v1.7.0 https://github.com/google/benchmark.git ~/benchmark RUN cd ~/benchmark && mkdir build && cd build && cmake .. -GNinja -DCMAKE_BUILD_TYPE=Release -DBENCHMARK_ENABLE_GTEST_TESTS=0 && ninja && ninja install # # clean up # RUN rm -rf /var/lib/apt/lists/* # RUN rm -rf /root/benchmark
#以下為build-image.sh #!/usr/bin/env bash SHELL_HOME=$( cd "$(dirname "$0")" || exit pwd ) source "${SHELL_HOME}/../dev.conf" # docker build --build-arg \ # --build-arg http_proxy= xxx\ # --build-arg https_proxy= xxx\ # --build-arg all_proxy=socks5 \ # --tag "${IMAGE_NAME}:${IMAGE_VERSION}" "${SHELL_HOME}" docker build --tag "${IMAGE_NAME}:${IMAGE_VERSION}" "${SHELL_HOME}"
運行容器
參考鏈接:https://blog.csdn.net/Maid_Li/article/details/124952650
在啟動docker容器的時候要注意加一些cuda的參數(shù)
--gpus all
和-e NVIDIA_VISIBLE_DEVICES=all
選擇這個容器可見的顯卡,直接全部就完事了-e NVIDIA_DRIVER_CAPABILITIES=compute,utility
配置了一些cuda必備的包如nvidia-smi之類的- 以下為start.sh
#!/usr/bin/env bash #當(dāng)前腳本路徑 SHELL_HOME=$( cd "$(dirname "$0")" || exit pwd ) source "${SHELL_HOME}"/../dev.conf source "${SHELL_HOME}"/utilities/rocks.conf CONTAINER_NAME="rocksdb-gpu" # work dir inside the dev container SOURCE_DIR_INSIDE="/home/baum/GPU_ROCKS" #本地源代碼目錄 SOURCE_DIR="/nvme/baum/git-project/GPU_ROCKS" WORK_DIR=/rocks RECREATE_CONTAINER="" #我執(zhí)行的./start.sh -s /nvme/baum/git-project/GPU_ROCKS function show_usage() { echo " Start a gdb container for Rocksdb. Usage: ./start.sh ./start.sh -s /path/to/your/cockroachdb/home Options: -s Project path of crdb, default is '${HOME}/go/src/github.com/cockroachdb'. -r Recreate the dev container. -h Show this message. " exit } while getopts "s:hr" opt; do case $opt in s) SOURCE_DIR=${OPTARG} ;; r) RECREATE_CONTAINER="true" ;; h) show_usage ;; *) show_usage ;; esac done CONTAINER_RUNNING=$(docker container ls | grep "${CONTAINER_NAME}") CONTAINER_EXISTED=$(docker container ls -a | grep "${CONTAINER_NAME}") if [[ ${RECREATE_CONTAINER} == "true" && -n ${CONTAINER_EXISTED} ]]; then echo "remove the existing rocksdb-gpu container ..." docker rm -f "${CONTAINER_NAME}" CONTAINER_EXISTED="" fi echo "current SOURCE_DIR is '${SOURCE_DIR}'" if [[ -z ${CONTAINER_EXISTED} ]]; then echo "staring the rocksdb-gpu environment 1 ..." #-v 掛載目錄,將前一個映射到后一個 docker run -it -v "${SOURCE_DIR}":/rocks \ -v "${SOURCE_DIR}":${SOURCE_DIR_INSIDE} \ --name ${CONTAINER_NAME} \ --publish "${ROCKS_PORT}"-"${GDB_PORT}":"${ROCKS_PORT}"-"${GDB_PORT}" \ --network=rocksdb-br \ --gpus all \ -e NVIDIA_DRIVER_CAPABILITIES=compute,utility \ -e NVIDIA_VISIBLE_DEVICES=all \ --workdir ${WORK_DIR} \ "${IMAGE_NAME}:${IMAGE_VERSION}" \ bash exit fi if [[ -z ${CONTAINER_RUNNING} ]]; then echo "starting rocksdb-gpu environment 2 ..." docker start "${CONTAINER_NAME}" fi echo "logging into rocksdb-gpu environment '${CONTAINER_NAME}' ..." docker exec -it "${CONTAINER_NAME}" bash
網(wǎng)絡(luò)配置
本地16017-16019映射到容器16017-16019
#init-docker-network.sh #!/usr/bin/env bash SHELL_HOME=$( cd "$(dirname "$0")" || exit pwd ) source "${SHELL_HOME}"/dev.conf echo "create network bridge for rocks ..." docker network create --subnet="${SUBNET}" "${BRIDGE_NAME}" docker network list
參考鏈接:
https://zhuanlan.zhihu.com/p/544713249
到此這篇關(guān)于docker中使用GPU+rocksdb的文章就介紹到這了,更多相關(guān)docker使用GPU內(nèi)容請搜索腳本之家以前的文章或繼續(xù)瀏覽下面的相關(guān)文章希望大家以后多多支持腳本之家!
相關(guān)文章
Docker網(wǎng)絡(luò)模型以及容器通信詳解續(xù)篇
這篇文章主要介紹了Docker網(wǎng)絡(luò)模型以及容器通信詳解續(xù)篇的相關(guān)資料,通過學(xué)習(xí)Docker網(wǎng)路驅(qū)動模型,更好地解決容器間的通信問題,需要的朋友可以參考下2022-11-11docker常用容器啟動docker-compose.yml配置文件使用
這篇文章主要介紹了docker常用容器啟動docker-compose.yml配置文件使用方式,具有很好的參考價值,希望對大家有所幫助,如有錯誤或未考慮完全的地方,望不吝賜教2025-03-03