Ceci est une ancienne révision du document !
Practical work support for Astrosim 2017
5W/2H : CQQCOQP (Comment ? Quoi ? Qui, Combien ? Où ? Quand ? Pourquoi ?) in french…
It's to take in the hands GPU components inside machines and compare performances to classical CPU trough simplistic examples and production codes.
In order to get a complete functional environment, Blaise Pascal Center provides hardware, software, and OS well designed. People who want to achieve this practical session on their own laptop must have a real Unix Operating System.
People who want to use huge GPU, GPGPU or accelerator can use connect to the following machines.
Hardware in computing science is defined by Von Neumann architecture:
GPU are normally considered as Input/Output devices. As mainly peripherals installed on PC machines, they use a interconnection bus, PCI or PCI Express.
To get the list of PCI devices, use lspci -nn
command. Inside this huge list appear some VGA or 3D devices. These are GPU or GPGPU devices.
This is an output of lspci -nn | egrep '(VGA|3D)'
command
06:00.0 VGA compatible controller: Advanced Micro Devices, Inc. [AMD/ATI] Fiji [Radeon R9 FURY / NANO Series] (rev ca) 82:00.0 VGA compatible controller: NVIDIA Corporation Device 1b06 (rev a1)
All of the huge
workstations hold Nvidia boards.
In Posix operating systems, everything is file. Informations about Nvidia board and its discovery by the operating system on boot time can be get by a grep
in dmesg
.
You can get kabalistic informations which are very important to
[ 19.545688] NVRM: The NVIDIA GPU 0000:82:00.0 (PCI ID: 10de:1b06) NVRM: NVIDIA Linux driver release. Please see 'Appendix NVRM: A - Supported NVIDIA GPU Products' in this release's NVRM: at www.nvidia.com. [ 19.545903] nvidia: probe of 0000:82:00.0 failed with error -1 [ 19.546254] NVRM: The NVIDIA probe routine failed for 1 device(s). [ 19.546491] NVRM: None of the NVIDIA graphics adapters were initialized! [ 19.782970] nvidia-nvlink: Nvlink Core is being initialized, major device number 244 [ 19.783084] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 375.66 Mon May 1 15:29:16 PDT 2017 (using threaded interrupts) [ 19.814046] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 375.66 Mon May 1 14:33:30 PDT 2017 [ 20.264453] [drm] [nvidia-drm] [GPU ID 0x00008200] Loading driver [ 23.360807] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:80/0000:80:02.0/0000:82:00.1/sound/card2/input19 [ 23.360885] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:80/0000:80:02.0/0000:82:00.1/sound/card2/input20 [ 23.360996] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:80/0000:80:02.0/0000:82:00.1/sound/card2/input21 [ 23.361065] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:80/0000:80:02.0/0000:82:00.1/sound/card2/input22 [ 32.896510] [drm] [nvidia-drm] [GPU ID 0x00008200] Unloading driver [ 32.935658] nvidia-modeset: Unloading [ 32.967939] nvidia-nvlink: Unregistered the Nvlink Core, major device number 244 [ 33.034671] nvidia-nvlink: Nvlink Core is being initialized, major device number 244 [ 33.034724] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 375.66 Mon May 1 15:29:16 PDT 2017 (using threaded interrupts) [ 33.275804] nvidia-nvlink: Unregistered the Nvlink Core, major device number 244 [ 33.993460] nvidia-nvlink: Nvlink Core is being initialized, major device number 244 [ 33.993486] NVRM: loading NVIDIA UNIX x86_64 Kernel Module 375.66 Mon May 1 15:29:16 PDT 2017 (using threaded interrupts) [ 35.110461] nvidia-modeset: Loading NVIDIA Kernel Mode Setting Driver for UNIX platforms 375.66 Mon May 1 14:33:30 PDT 2017 [ 35.111628] nvidia-modeset: Allocated GPU:0 (GPU-ccc95482-6681-052e-eb30-20b138412b92) @ PCI:0000:82:00.0 [349272.210486] nvidia-uvm: Loaded the UVM driver in 8 mode, major device number 243
input: HDA NVidia
device ? Is it a graphical one ?
The lsmod
provides the list of modules loaded. Modules are small programs dedicated to the support of on function in a kernel, the engine of the Operating System. The support of a device needs one or several modules.
An example of lsmod | grep nvidia
on a workstation:
nvidia_uvm 638976 0 nvidia_modeset 790528 2 nvidia 12312576 42 nvidia_modeset,nvidia_uvm
We see that 3 modules are loaded. The last column (empty for the two first lines) lists the dependencies between modules. Here, nvidia_modeset
and nvidia_uvm
depend on nvidia
module.
The device also appears in /dev
the root folder for devices.
A ls -l /dev/nvidia*
provides this kind of informations
crw-rw-rw- 1 root root 195, 0 Jun 30 18:17 /dev/nvidia0 crw-rw-rw- 1 root root 195, 255 Jun 30 18:17 /dev/nvidiactl crw-rw-rw- 1 root root 195, 254 Jun 30 18:17 /dev/nvidia-modeset crw-rw-rw- 1 root root 243, 0 Jul 4 19:17 /dev/nvidia-uvm crw-rw-rw- 1 root root 243, 1 Jul 4 19:17 /dev/nvidia-uvm-tools
We can see that everybody can access to the device. There is only one NVIDIA device, nvidia0
. On a multiple Nvidia GPU machine, we got nvidia0
, nvidia1
, etc…
/dev/nvidia<number>
do you get ?
Nvidia provides information about its recognized devices via nvidia-smi
command. This command can also be used to configure some tricks inside the GPU.
An example of nvidia-smi
output:
Fri Jul 7 07:46:56 2017 +-----------------------------------------------------------------------------+ | NVIDIA-SMI 375.66 Driver Version: 375.66 | |-------------------------------+----------------------+----------------------+ | GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. | |===============================+======================+======================| | 0 GeForce GTX 108... Off | 0000:82:00.0 On | N/A | | 23% 31C P8 10W / 250W | 35MiB / 11172MiB | 0% Default | +-------------------------------+----------------------+----------------------+ +-----------------------------------------------------------------------------+ | Processes: GPU Memory | | GPU PID Type Process name Usage | |=============================================================================| | 0 4108 G /usr/lib/xorg/Xorg 32MiB | +-----------------------------------------------------------------------------+
Lots of informations are available in this output:
nvidia-smi
softwareid
of GPU, As we see in the introduction on GPU, programming them can be achieved with several ways. The first, for Nvidia devices, is to use CUDA environment. The problem is that it's impossible to reuse your program on other platform or compare directly with CPU. OpenCL is a more agnostic way.
On the workstations in CBP, all available implementations of OpenCL are available.
The command clinfo
provides informations about devices. Here is an example of a short output with clinfo '-l'
:
Platform #0: Clover Platform #1: Portable Computing Language `-- Device #0: pthread-Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz Platform #2: NVIDIA CUDA `-- Device #0: GeForce GTX 1080 Ti Platform #3: Intel(R) OpenCL `-- Device #0: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz Platform #4: AMD Accelerated Parallel Processing `-- Device #0: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz
#0
Clover implementation is a GPU one, based on Open Source drivers of GNU/Linux and provided by Mesa#1
Portable Computing Language is a CPU one. Not very efficient but Open Source.#2
NVIDIA CUDA implementation is a GPU one. The devices detected are listed below#3
Intel(R) OpenCL implementation is a CPU one. Provided by Intel, very efficient but FP results are sometimes strange.#4
AMD Accelerated Parallel Processing is a CPU one. Provided by AMD, rather efficient, the oldest one.
The command clinfo
without options provides lots (to much…) informations. You can restrict them for example to several attributes as Platform Name
,Device Name
,Max compute
,Max clock
.
On the example platform, the command clinfo | egrep '(Platform Name|Device Name|Max compute|Max clock)'
provides the output:
Platform Name Clover Platform Name Portable Computing Language Platform Name NVIDIA CUDA Platform Name Intel(R) OpenCL Platform Name AMD Accelerated Parallel Processing Platform Name Clover Platform Name Portable Computing Language Device Name pthread-Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz Max compute units 32 Max clock frequency 2401MHz Platform Name NVIDIA CUDA Device Name GeForce GTX 1080 Ti Max compute units 28 Max clock frequency 1582MHz Platform Name Intel(R) OpenCL Device Name Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz Max compute units 32 Max clock frequency 2400MHz Platform Name AMD Accelerated Parallel Processing Device Name Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz Max compute units 32 Max clock frequency 1200MHz
In the lecture about the GPUs, we present the GPU as a great matrix multiplier. On of the most common Linear Algebra librairies is BLAS one, formelly Basic Linear Algebra Subprograms.
These subprograms can be considered as standard one. Lots of implementations exist on all architectures. On GPU, Nvidia provides its version with cuBLAS and AMD release in Open Source its OpenCL implementation clBLAS.
On CPU, Intel sells its optimized implementation in MKL librairies but an Open Source equivalent, OpenBLAS. Several others implementations exist and are deployed on CBP machines : ATLAS and GSL.
The implementation on Matrix Multiply in BLAS librairies is xGEMM
, with x
to be replaced by S
, D
, C
and Z
respectively for Simple precision (32 bits), Double precision (64 bits), Complex & Simple precision, Complex & Double precision.
Inside /scratch/Astrosim2017/xGEMM
are programs implementing xGEMM for simple xGEMM_SP_<version>
or double xGEMM_DP_<version>
:
fblas
using ATLAS librariesopenblas
using OpenBLAS librariesgsl
using GSL librairiescublas
using cuBLAS libraries with internal memory managementthunking
using cuBLAS libraries with external memory management
The source code and Makefile
using to compile these examples is available in tarball at:
/scratch/AstroSim2017/xGEMM_EQ_170707.tgz
The program call with -h
option provides tiny informations to launch it. Input parameters are:
The output provides: