k8s集群内带GPU工作节点配置显卡驱动 | 您所在的位置:网站首页 › gpu可以访问显存吗 › k8s集群内带GPU工作节点配置显卡驱动 |
k8s集群内带GPU工作节点配置显卡驱动 系统为Centos7 一、下载、安装显卡驱动 查看显卡型号 [root@VM-3-9-centos user]# lspci | grep -i nvidia 00:08.0 3D controller: NVIDIA Corporation TU104GL [Tesla T4] (rev a1)1.1、官网下载驱动程序 https://www.nvidia.cn/Download/index.aspx 注:cuda最好12版本 1.2、安装显卡驱动 bash NVIDIA-Linux-x86_64-525.105.17.run查看是否安装成功 [root@VM-3-9-centos user]# nvidia-smi Wed May 17 13:04:48 2023 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 525.105.17 Driver Version: 525.105.17 CUDA Version: 12.0 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |===============================+======================+======================| | 0 Tesla T4 Off | 00000000:00:08.0 Off | 0 | | N/A 45C P0 26W / 70W | 3414MiB / 15360MiB | 0% Default | | | | N/A | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: | | GPU GI CI PID Type Process name GPU Memory | | ID ID Usage | |=============================================================================| | 0 N/A N/A 19692 C python3 3410MiB | +-----------------------------------------------------------------------------+ [root@VM-3-9-centos user]#卸载显卡驱动 需要重启服务器 /usr/bin/nvidia-uninstall1.3、安装nvidia-docker2 yum install -y nvidia-docker2 yum install -y nvidia-container-runtime二、配置环境支持显卡 2.1、修改daemon.json { "registry-mirrors": [ "https://tf72mndn.mirror.aliyuncs.com" ], "exec-opts": ["native.cgroupdriver=systemd"], "storage-driver": "overlay2", "log-opts": { "max-file": "3", "max-size": "500m" }, "storage-opts": ["overlay2.override_kernel_check=true"], "default-runtime": "nvidia", "runtimes": { "nvidia": { "path": "/usr/bin/nvidia-container-runtime", "runtimeArgs": [] } } }2.2、部署k8s nvidia插件 kubectl apply -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/1.0.0-beta4/nvidia-device-plugin.yml注:修改部署类型,如果有多台显卡,可以选择部署到有显卡的服务器。 2.3、K8S集群内检查显卡 [root@VM-2-8-centos user]# kubectl describe node vm-3-9-centos |grep nv nvidia.com/gpu.present=true nvidia.com/gpu: 1 nvidia.com/gpu: 1 kube-system nvidia-device-plugin-daemonset-4p97n 0 (0%) 0 (0%) 0 (0%) 0 (0%) 85m nvidia.com/gpu 1 12.4、通过rancher设置容器使用显卡数量 |
CopyRight 2018-2019 实验室设备网 版权所有 |