NVIDIA显卡

nvidia-smi

nvidia-smi查看gpu的使用情况

[root@localhost GA-EDA-FACE-RECOG0514]# nvidia-smi
Mon Sep 23 16:14:51 2019       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.56       Driver Version: 418.56       CUDA Version: 10.1     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:02:00.0 Off |                    0 |
| N/A   32C    P0    34W / 250W |  10356MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  Off  | 00000000:03:00.0 Off |                    0 |
| N/A   37C    P0    32W / 250W |   7768MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   2  Tesla P100-PCIE...  Off  | 00000000:82:00.0 Off |                    0 |
| N/A   37C    P0    33W / 250W |   7768MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   3  Tesla P100-PCIE...  Off  | 00000000:83:00.0 Off |                    0 |
| N/A   34C    P0    32W / 250W |   7768MiB / 16280MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                       GPU Memory |
|  GPU       PID   Type   Process name                             Usage      |
|=============================================================================|
|    0     14649      C   /usr/bin/python3                            1293MiB |
|    0     15192      C   /usr/bin/python3                            1293MiB |
|    0     15193      C   /usr/bin/python3                            1293MiB |
|    0     15194      C   /usr/bin/python3                            1293MiB |
|    0     15195      C   /usr/bin/python3                            1293MiB |
|    0     15196      C   /usr/bin/python3                            1293MiB |
|    0     15197      C   /usr/bin/python3                            1293MiB |
|    0     15198      C   /usr/bin/python3                            1293MiB |
|    1     15199      C   /usr/bin/python3                            1293MiB |
|    1     15200      C   /usr/bin/python3                            1293MiB |
|    1     15207      C   /usr/bin/python3                            1293MiB |
|    1     15214      C   /usr/bin/python3                            1293MiB |
|    1     15215      C   /usr/bin/python3                            1293MiB |
|    1     15219      C   /usr/bin/python3                            1293MiB |
|    2     15224      C   /usr/bin/python3                            1293MiB |
|    2     15227      C   /usr/bin/python3                            1293MiB |
|    2     15234      C   /usr/bin/python3                            1293MiB |
|    2     15238      C   /usr/bin/python3                            1293MiB |
|    2     15242      C   /usr/bin/python3                            1293MiB |
|    2     15243      C   /usr/bin/python3                            1293MiB |
|    3     15250      C   /usr/bin/python3                            1293MiB |
|    3     15254      C   /usr/bin/python3                            1293MiB |
|    3     15258      C   /usr/bin/python3                            1293MiB |
|    3     15261      C   /usr/bin/python3                            1293MiB |
|    3     15266      C   /usr/bin/python3                            1293MiB |
|    3     15270      C   /usr/bin/python3                            1293MiB |
+-----------------------------------------------------------------------------+
  • Fan:风扇转速
  • Temp:温度(摄氏度)
  • Perf:性能状态(从P0到P12,P0表示最大性能,P12表示状态最小性能)
  • Pwr: 能耗
  • Memory Usage: 显存使用率
  • GPU-Util : GPU利用率
  • 显存占用和GPU占用不一样,显卡由显存和GPU组成

CUDA_VISIBLE_DEVICES

场景

有一台服务器,服务器上有多块儿GPU可以供使用,但此时只希望使用第2块和第4块GPU,但是我们希望代码能看到的仍然是有两块GPU,分别编号为0,1,这个时候我们可以使用环境变量CUDA_VISIBLE_DEVICES来解决这个问题。

CUDA_VISIBLE_DEVICES=1  只有编号为1的GPU对程序是可见的,在代码中gpu[0]指的就是这块儿GPU
CUDA_VISIBLE_DEVICES=0,2,3  只有编号为0,2,3的GPU对程序是可见的,在代码中gpu[0]指的是第0块儿,gpu[1]指的是第2块儿,gpu[2]指的是第3块儿
CUDA_VISIBLE_DEVICES=2,0,3  只有编号为0,2,3的GPU对程序是可见的,但是在代码中gpu[0]指的是第2块儿,gpu[1]指的是第0块儿,gpu[2]指的是第3块儿

GPU CUDA CUDNN

  • CUDA是NVIDIA推出的用于自家GPU的并行计算框架,也就是说CUDA只能在NVIDIA的GPU上运行,而且只有当要解决的计算问题是可以大量并行计算的时候才能发挥CUDA的作用。

  • cuDNN(CUDA Deep Neural Network library):是NVIDIA打造的针对深度神经网络的加速库,是一个用于深层神经网络的GPU加速库。如果你要用GPU训练模型,cuDNN不是必须的,但是一般会采用这个加速库。

Ubuntu 安装显卡

//卸载旧的版本

apt-get remove nvidia-* (若安装过其他版本或其他方式安装过驱动执行此项)

//查看显卡适合的驱动

root@k8s:/home/dpl# ubuntu-drivers
usage: ubuntu-drivers [-h] [--package-list PATH]
ubuntu-drivers: error: the following arguments are required:
root@k8s:/home/dpl# ubuntu-drivers devices

== /sys/devices/pci0000:80/0000:80:03.0/0000:86:00.0/0000:87:10.0/0000:89:00.0 ==
modalias : pci:v000010DEd0000102Dsv000010DEsd0000106Cbc03sc02i00
vendor : NVIDIA Corporation
model : GK210GL [Tesla K80]
driver : xserver-xorg-video-nouveau - distro free builtin
driver : nvidia-384 - distro non-free recommended

root@k8s:/home/dpl#
root@k8s:/home/dpl# docker exec -it alg-server bash

//安装显卡驱动

root@k8s:/# apt-get install nvidia-384

root@k8s:/home/dpl# cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 418.39 Sat Feb 9 19:19:37 CST 2019
GCC version: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.11)

评论

Your browser is out-of-date!

Update your browser to view this website correctly. Update my browser now

×