Ci-dessous, les différences entre deux révisions de la page.
Les deux révisions précédentes Révision précédente Prochaine révision | Révision précédente | ||
developpement:activites:integration:cuda4jessie [2015/11/17 14:31] equemene |
developpement:activites:integration:cuda4jessie [2016/04/21 19:10] (Version actuelle) equemene |
||
---|---|---|---|
Ligne 1: | Ligne 1: | ||
====== Rétroportage de pilote Nvidia & Cuda sur une Debian Jessie ====== | ====== Rétroportage de pilote Nvidia & Cuda sur une Debian Jessie ====== | ||
+ | <note warning>Attention ! Cassé au 21 avril 2016 pour la partie PyCUDA et PyOpenCL</note> | ||
===== Rétroportage de CUDA 6.5 sous Jessie ===== | ===== Rétroportage de CUDA 6.5 sous Jessie ===== | ||
Ligne 7: | Ligne 8: | ||
- | Voici la méthode permettant d'installer à partir d'un rétro-portage les paquets Nvidia ainsi que tout l'environnement associé. Au 28 mai 2015, il n'est pas possible de récupérer la version 346.59 se trouvant dans l'archive **experimental** : le module NVIDIA casse à l'installation par DKMS. Il est donc nécessaire d'installer le dernier pilote opérationnel issu des snaphots Debian, le pilote 343.36. L'environnement de développement, le 6.5.14, sera rétroporté directement de l'archive **experimental**. | + | Voici la méthode permettant d'installer à partir d'un rétro-portage les paquets Nvidia ainsi que tout l'environnement associé. Au 17 novembre 2015, il est à nouveau possible de récupérer la version 352.55 se trouvant dans l'archive **experimental** : le module NVIDIA se construit correctement à l'installation par DKMS. L'environnement de développement, le 6.5.19, sera rétroporté directement de l'archive **experimental**. |
<note warning>Pour une compilation dans SIDUS, ne pas oublier de monter le ''/proc''</note> | <note warning>Pour une compilation dans SIDUS, ne pas oublier de monter le ''/proc''</note> | ||
Ligne 81: | Ligne 82: | ||
Les paquets créés sont les suivants : | Les paquets créés sont les suivants : | ||
<code> | <code> | ||
- | libcuda1_352.55-3_amd64.deb | + | libcuda1_355.11-2_amd64.deb |
- | libegl1-nvidia_352.55-3_amd64.deb | + | libegl1-nvidia_355.11-2_amd64.deb |
- | libgl1-nvidia-glx_352.55-3_amd64.deb | + | libegl-nvidia0_355.11-2_amd64.deb |
- | libgles1-nvidia_352.55-3_amd64.deb | + | libgl1-nvidia-glx_355.11-2_amd64.deb |
- | libgles2-nvidia_352.55-3_amd64.deb | + | libgles1-nvidia_355.11-2_amd64.deb |
- | libnvcuvid1_352.55-3_amd64.deb | + | libgles2-nvidia_355.11-2_amd64.deb |
- | libnvidia-compiler_352.55-3_amd64.deb | + | libglvnd-nvidia_355.11-2_amd64.deb |
- | libnvidia-eglcore_352.55-3_amd64.deb | + | libnvcuvid1_355.11-2_amd64.deb |
- | libnvidia-encode1_352.55-3_amd64.deb | + | libnvidia-compiler_355.11-2_amd64.deb |
- | libnvidia-fbc1_352.55-3_amd64.deb | + | libnvidia-eglcore_355.11-2_amd64.deb |
- | libnvidia-ifr1_352.55-3_amd64.deb | + | libnvidia-encode1_355.11-2_amd64.deb |
- | libnvidia-ml1_352.55-3_amd64.deb | + | libnvidia-fbc1_355.11-2_amd64.deb |
- | nvidia-alternative_352.55-3_amd64.deb | + | libnvidia-ifr1_355.11-2_amd64.deb |
- | nvidia-cuda-mps_352.55-3_amd64.deb | + | libnvidia-ml1_355.11-2_amd64.deb |
- | nvidia-detect_352.55-3_amd64.deb | + | nvidia-alternative_355.11-2_amd64.deb |
- | nvidia-driver_352.55-3_amd64.deb | + | nvidia-cuda-mps_355.11-2_amd64.deb |
- | nvidia-driver-bin_352.55-3_amd64.deb | + | nvidia-detect_355.11-2_amd64.deb |
- | nvidia-kernel-dkms_352.55-3_amd64.deb | + | nvidia-driver_355.11-2_amd64.deb |
- | nvidia-kernel-source_352.55-3_amd64.deb | + | nvidia-driver-bin_355.11-2_amd64.deb |
- | nvidia-kernel-support_352.55-3_amd64.deb | + | nvidia-kernel-dkms_355.11-2_amd64.deb |
- | nvidia-libopencl1_352.55-3_amd64.deb | + | nvidia-kernel-source_355.11-2_amd64.deb |
- | nvidia-opencl-common_352.55-3_amd64.deb | + | nvidia-kernel-support_355.11-2_amd64.deb |
- | nvidia-opencl-icd_352.55-3_amd64.deb | + | nvidia-legacy-check_355.11-2_amd64.deb |
- | nvidia-smi_352.55-3_amd64.deb | + | nvidia-libopencl1_355.11-2_amd64.deb |
- | nvidia-vdpau-driver_352.55-3_amd64.deb | + | nvidia-opencl-common_355.11-2_amd64.deb |
- | xserver-xorg-video-nvidia_352.55-3_amd64.deb | + | nvidia-opencl-icd_355.11-2_amd64.deb |
+ | nvidia-smi_355.11-2_amd64.deb | ||
+ | nvidia-vdpau-driver_355.11-2_amd64.deb | ||
+ | xserver-xorg-video-nvidia_355.11-2_amd64.deb | ||
</code> | </code> | ||
Ligne 119: | Ligne 123: | ||
Les paquets créés sont les suivants : | Les paquets créés sont les suivants : | ||
<code> | <code> | ||
- | nvidia-modprobe_358.09-1_amd64.deb | + | nvidia-modprobe_361.28-1_amd64.deb |
</code> | </code> | ||
Ligne 162: | Ligne 166: | ||
Les paquets créés sont les suivants : | Les paquets créés sont les suivants : | ||
<code> | <code> | ||
- | nvidia-installer-cleanup_20151021+1_amd64.deb | + | nvidia-installer-cleanup_20151021+4_amd64.deb |
- | nvidia-kernel-common_20151021+1_amd64.deb | + | nvidia-kernel-common_20151021+4_amd64.deb |
- | nvidia-support_20151021+1_amd64.deb | + | nvidia-support_20151021+4_amd64.deb |
</code> | </code> | ||
Ligne 183: | Ligne 187: | ||
dpkg -i nvidia-modprobe_*_amd64.deb | dpkg -i nvidia-modprobe_*_amd64.deb | ||
+ | dpkg -i nvidia-legacy-check-*_amd64.deb | ||
dpkg -i nvidia-alternative_*_amd64.deb | dpkg -i nvidia-alternative_*_amd64.deb | ||
Ligne 195: | Ligne 199: | ||
dpkg -i nvidia-kernel-dkms_*_amd64.deb | dpkg -i nvidia-kernel-dkms_*_amd64.deb | ||
- | ls -1 lib*352.55*deb | xargs -I '{}' dpkg -i '{}' | + | ls -1 lib*$(dpkg -l | grep nvidia-kernel-dkms | awk '{ print $3 }')*deb | xargs -I '{}' dpkg -i '{}' |
apt-get -f install | apt-get -f install | ||
Ligne 203: | Ligne 207: | ||
dpkg -i nvidia-vdpau-driver_*_amd64.deb nvidia-driver_*_amd64.deb nvidia-driver-bin_*_amd64.deb | dpkg -i nvidia-vdpau-driver_*_amd64.deb nvidia-driver_*_amd64.deb nvidia-driver-bin_*_amd64.deb | ||
- | ls *352.55*deb | grep -v ^lib | grep -v nvidia-kernel | grep -v libopencl | xargs -I '{}' dpkg -i '{}' | + | ls *$(dpkg -l | grep nvidia-kernel-dkms | awk '{ print $3 }')*deb | grep -v ^lib | grep -v nvidia-kernel | grep -v libopencl | xargs -I '{}' dpkg -i '{}' |
dpkg -i nvidia-driver-bin_*deb nvidia-driver_*deb nvidia-xconfig* nvidia-settings* libxnvctrl* nvidia-smi_* | dpkg -i nvidia-driver-bin_*deb nvidia-driver_*deb nvidia-xconfig* nvidia-settings* libxnvctrl* nvidia-smi_* | ||
Ligne 226: | Ligne 230: | ||
Les paquets créés sont les suivants : | Les paquets créés sont les suivants : | ||
<code> | <code> | ||
+ | libcublas7.0_7.0.28-4_amd64.deb | ||
+ | libcudart7.0_7.0.28-4_amd64.deb | ||
+ | libcufft7.0_7.0.28-4_amd64.deb | ||
+ | libcufftw7.0_7.0.28-4_amd64.deb | ||
+ | libcuinj64-7.0_7.0.28-4_amd64.deb | ||
+ | libcupti7.0_7.0.28-4_amd64.deb | ||
+ | libcupti-dev_7.0.28-4_amd64.deb | ||
+ | libcupti-doc_7.0.28-4_all.deb | ||
+ | libcurand7.0_7.0.28-4_amd64.deb | ||
+ | libcusolver7.0_7.0.28-4_amd64.deb | ||
+ | libcusparse7.0_7.0.28-4_amd64.deb | ||
+ | libnppc7.0_7.0.28-4_amd64.deb | ||
+ | libnppi7.0_7.0.28-4_amd64.deb | ||
+ | libnpps7.0_7.0.28-4_amd64.deb | ||
+ | libnvblas7.0_7.0.28-4_amd64.deb | ||
+ | libnvrtc7.0_7.0.28-4_amd64.deb | ||
+ | libnvtoolsext1_7.0.28-4_amd64.deb | ||
+ | libnvvm3_7.0.28-4_amd64.deb | ||
+ | nvidia-cuda-dev_7.0.28-4_amd64.deb | ||
+ | nvidia-cuda-doc_7.0.28-4_all.deb | ||
+ | nvidia-cuda-gdb_7.0.28-4_amd64.deb | ||
+ | nvidia-cuda-toolkit_7.0.28-4_amd64.deb | ||
+ | nvidia-nsight_7.0.28-4_amd64.deb | ||
+ | nvidia-opencl-dev_7.0.28-4_amd64.deb | ||
+ | nvidia-profiler_7.0.28-4_amd64.deb | ||
+ | nvidia-visual-profiler_7.0.28-4_amd64.deb | ||
</code> | </code> | ||
Ligne 234: | Ligne 263: | ||
apt-get install -y opencl-headers ocl-icd-opencl-dev | apt-get install -y opencl-headers ocl-icd-opencl-dev | ||
cd /root/nvidia/debian | cd /root/nvidia/debian | ||
- | ls *6.5.19*deb | grep -v opencl | xargs -I '{}' dpkg -i '{}' | + | ls *7.0.28*deb | grep -v opencl | xargs -I '{}' dpkg -i '{}' |
# Il arrive que l'installeur rale sur nvidia-cuda-toolkit. Si c'est le cas, la commande suivante s'impose | # Il arrive que l'installeur rale sur nvidia-cuda-toolkit. Si c'est le cas, la commande suivante s'impose | ||
apt-get -f install | apt-get -f install | ||
Ligne 243: | Ligne 272: | ||
<code> | <code> | ||
cd /root/nvidia | cd /root/nvidia | ||
- | apt-get -y source python-pyopencl | + | apt-get -y build-dep pycuda |
- | apt-get -y build-dep python-pyopencl | + | apt-get -y install python-setuptools python3-setuptools |
- | cd pyopencl-* | + | wget http://snapshot.debian.org/archive/debian/20150617T043723Z/pool/main/p/pyopencl/pyopencl_2015.1-2.debian.tar.xz |
+ | wget http://snapshot.debian.org/archive/debian/20150617T043723Z/pool/main/p/pyopencl/pyopencl_2015.1-2.dsc | ||
+ | wget http://snapshot.debian.org/archive/debian/20150610T042543Z/pool/main/p/pyopencl/pyopencl_2015.1.orig.tar.xz | ||
+ | tar Jxf pyopencl_2015.1.orig.tar.xz | ||
+ | cd pyopencl-*/ | ||
+ | tar Jxf ../pyopencl_2015.1-2.debian.tar.xz | ||
debuild | debuild | ||
cd .. | cd .. | ||
Ligne 254: | Ligne 288: | ||
Les paquets créés sont les suivants : | Les paquets créés sont les suivants : | ||
<code> | <code> | ||
- | python3-pyopencl_2015.1-1_amd64.deb | + | python3-pyopencl_2015.1-2_amd64.deb |
- | python3-pyopencl-dbg_2015.1-1_amd64.deb | + | python3-pyopencl-dbg_2015.1-2_amd64.deb |
- | python-pyopencl_2015.1-1_amd64.deb | + | python-pyopencl_2015.1-2_amd64.deb |
- | python-pyopencl-dbg_2015.1-1_amd64.deb | + | python-pyopencl-dbg_2015.1-2_amd64.deb |
- | python-pyopencl-doc_2015.1-1_all.deb | + | python-pyopencl-doc_2015.1-2_all.deb |
</code> | </code> | ||
Ligne 269: | Ligne 303: | ||
</code> | </code> | ||
+ | Il n'est pas possible de rétroporter la dernière version de pycuda en Jessie à cause des dépendances entre PyCUDA et d'autres librairies récentes n'acceptant pas le rétroportage. | ||
<code> | <code> | ||
cd /root/nvidia | cd /root/nvidia | ||
- | apt-get source -y pycuda | ||
apt-get -y build-dep pycuda | apt-get -y build-dep pycuda | ||
+ | apt-get -y install python-setuptools python3-setuptools | ||
+ | wget http://snapshot.debian.org/archive/debian/20150710T034220Z/pool/contrib/p/pycuda/pycuda_2015.1.2-1.debian.tar.xz | ||
+ | wget http://snapshot.debian.org/archive/debian/20150710T034220Z/pool/contrib/p/pycuda/pycuda_2015.1.2-1.dsc | ||
+ | wget http://snapshot.debian.org/archive/debian/20150710T034220Z/pool/contrib/p/pycuda/pycuda_2015.1.2.orig.tar.xz | ||
+ | tar Jxf pycuda_2015.1.2.orig.tar.xz | ||
cd pycuda-*/ | cd pycuda-*/ | ||
+ | tar Jxf ../pycuda_2015.1.2-1.debian.tar.xz | ||
debuild | debuild | ||
cd .. | cd .. | ||
Ligne 302: | Ligne 342: | ||
<code> | <code> | ||
cd /root/nvidia/debian | cd /root/nvidia/debian | ||
- | apt-get install -y python-pytest python3-pytest | + | apt-get install -y python-pytest python3-pytest python3-appdirs python-appdirs |
dpkg -i python-pycuda*deb python3-pycuda*deb | dpkg -i python-pycuda*deb python3-pycuda*deb | ||
</code> | </code> | ||
Ligne 308: | Ligne 348: | ||
==== Exécution des exemples ==== | ==== Exécution des exemples ==== | ||
- | <code> | + | Sur une machine équipée de 3 cartes vidéo et 3 implémentations d'OpenCL pour CPU (AMD, Intel, PortableCL) |
+ | |||
+ | <code> | ||
python /usr/share/doc/python-pyopencl-doc/examples/benchmark.py | python /usr/share/doc/python-pyopencl-doc/examples/benchmark.py | ||
</code> | </code> | ||
- | Sur des cartes Tesla C1060 & Quadro FX 580 (et un Dell Precision 3500):<code> | + | <code> |
- | ('Execution time of test without OpenCL: ', 7.415176868438721, 's') | + | |
=============================================================== | =============================================================== | ||
- | ('Platform name:', 'NVIDIA CUDA') | + | Platform name: AMD Accelerated Parallel Processing |
- | ('Platform profile:', 'FULL_PROFILE') | + | Platform profile: FULL_PROFILE |
- | ('Platform vendor:', 'NVIDIA Corporation') | + | Platform vendor: Advanced Micro Devices, Inc. |
- | ('Platform version:', 'OpenCL 1.1 CUDA 6.0.1') | + | Platform version: OpenCL 2.0 AMD-APP (1800.11) |
--------------------------------------------------------------- | --------------------------------------------------------------- | ||
- | ('Device name:', 'Tesla C1060') | + | Device name: Fiji |
- | ('Device type:', 'GPU') | + | Device type: GPU |
- | ('Device memory: ', 4095, 'MB') | + | Device memory: 4045 MB |
- | ('Device max clock speed:', 1296, 'MHz') | + | Device max clock speed: 1000 MHz |
- | ('Device compute units:', 30) | + | Device compute units: 64 |
- | Execution time of test: 0.00188525 s | + | Device max work group size: 256 |
+ | Device max work item sizes: [256, 256, 256] | ||
+ | Data points: 8388608 | ||
+ | Workers: 256 | ||
+ | Preferred work group size multiple: 64 | ||
+ | Execution time of test: 0.00037168 s | ||
Results OK | Results OK | ||
=============================================================== | =============================================================== | ||
- | ('Platform name:', 'NVIDIA CUDA') | + | Platform name: AMD Accelerated Parallel Processing |
- | ('Platform profile:', 'FULL_PROFILE') | + | Platform profile: FULL_PROFILE |
- | ('Platform vendor:', 'NVIDIA Corporation') | + | Platform vendor: Advanced Micro Devices, Inc. |
- | ('Platform version:', 'OpenCL 1.1 CUDA 6.0.1') | + | Platform version: OpenCL 2.0 AMD-APP (1800.11) |
--------------------------------------------------------------- | --------------------------------------------------------------- | ||
- | ('Device name:', 'Quadro FX 580') | + | Device name: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz |
- | ('Device type:', 'GPU') | + | Device type: CPU |
- | ('Device memory: ', 511, 'MB') | + | Device memory: 128966 MB |
- | ('Device max clock speed:', 1125, 'MHz') | + | Device max clock speed: 2309 MHz |
- | ('Device compute units:', 4) | + | Device compute units: 32 |
- | Execution time of test: 0.0126466 s | + | Device max work group size: 1024 |
+ | Device max work item sizes: [1024, 1024, 1024] | ||
+ | Data points: 8388608 | ||
+ | Workers: 256 | ||
+ | Preferred work group size multiple: 1 | ||
+ | Execution time of test: 0.0192504 s | ||
Results OK | Results OK | ||
=============================================================== | =============================================================== | ||
- | ('Platform name:', 'AMD Accelerated Parallel Processing') | + | Platform name: Intel(R) OpenCL |
- | ('Platform profile:', 'FULL_PROFILE') | + | Platform profile: FULL_PROFILE |
- | ('Platform vendor:', 'Advanced Micro Devices, Inc.') | + | Platform vendor: Intel(R) Corporation |
- | ('Platform version:', 'OpenCL 1.2 AMD-APP (938.2)') | + | Platform version: OpenCL 1.2 LINUX |
--------------------------------------------------------------- | --------------------------------------------------------------- | ||
- | ('Device name:', 'Intel(R) Xeon(R) CPU W3565 @ 3.20GHz') | + | Device name: Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz |
- | ('Device type:', 'CPU') | + | Device type: CPU |
- | ('Device memory: ', 12041, 'MB') | + | Device memory: 128966 MB |
- | ('Device max clock speed:', 3199, 'MHz') | + | Device max clock speed: 2400 MHz |
- | ('Device compute units:', 4) | + | Device compute units: 32 |
- | Execution time of test: 0.00191834 s | + | Device max work group size: 8192 |
+ | Device max work item sizes: [8192, 8192, 8192] | ||
+ | /usr/lib/python2.7/dist-packages/pyopencl/__init__.py:63: CompilerWarning: Non-empty compiler output encountered. Set the environment variable PYOPENCL_COMPILER_OUTPUT=1 to see more. | ||
+ | "to see more.", CompilerWarning) | ||
+ | Data points: 8388608 | ||
+ | Workers: 256 | ||
+ | Preferred work group size multiple: 128 | ||
+ | Execution time of test: 0.00310517 s | ||
+ | Results OK | ||
+ | =============================================================== | ||
+ | Platform name: Portable Computing Language | ||
+ | Platform profile: FULL_PROFILE | ||
+ | Platform vendor: The pocl project | ||
+ | Platform version: OpenCL 1.2 pocl 0.10 | ||
+ | --------------------------------------------------------------- | ||
+ | Device name: pthread-Intel(R) Xeon(R) CPU E5-2665 0 @ 2.40GHz | ||
+ | Device type: CPU | ||
+ | Device memory: 128966 MB | ||
+ | Device max clock speed: 3100 MHz | ||
+ | Device compute units: 32 | ||
+ | Device max work group size: 1024 | ||
+ | Device max work item sizes: [1024, 1024, 1024] | ||
+ | Data points: 8388608 | ||
+ | Workers: 256 | ||
+ | Preferred work group size multiple: 8 | ||
+ | Execution time of test: 0.007638 s | ||
Results OK | Results OK | ||
- | </code> | ||
- | |||
- | Sur une carte GT650M (et un Mac Book Pro):<code> | ||
- | ('Execution time of test without OpenCL: ', 7.595532178878784, 's') | ||
=============================================================== | =============================================================== | ||
- | ('Platform name:', 'NVIDIA CUDA') | + | Platform name: NVIDIA CUDA |
- | ('Platform profile:', 'FULL_PROFILE') | + | Platform profile: FULL_PROFILE |
- | ('Platform vendor:', 'NVIDIA Corporation') | + | Platform vendor: NVIDIA Corporation |
- | ('Platform version:', 'OpenCL 1.1 CUDA 6.0.1') | + | Platform version: OpenCL 1.2 CUDA 7.5.20 |
--------------------------------------------------------------- | --------------------------------------------------------------- | ||
- | ('Device name:', 'GeForce GT 650M') | + | Device name: GeForce GTX 980 Ti |
- | ('Device type:', 'GPU') | + | Device type: GPU |
- | ('Device memory: ', 511, 'MB') | + | Device memory: 6143 MB |
- | ('Device max clock speed:', 405, 'MHz') | + | Device max clock speed: 1190 MHz |
- | ('Device compute units:', 2) | + | Device compute units: 22 |
- | Execution time of test: 0.0011792 s | + | Device max work group size: 1024 |
+ | Device max work item sizes: [1024, 1024, 64] | ||
+ | Data points: 8388608 | ||
+ | Workers: 256 | ||
+ | Preferred work group size multiple: 32 | ||
+ | Execution time of test: 0.000522592 s | ||
Results OK | Results OK | ||
=============================================================== | =============================================================== | ||
- | ('Platform name:', 'AMD Accelerated Parallel Processing') | + | Platform name: NVIDIA CUDA |
- | ('Platform profile:', 'FULL_PROFILE') | + | Platform profile: FULL_PROFILE |
- | ('Platform vendor:', 'Advanced Micro Devices, Inc.') | + | Platform vendor: NVIDIA Corporation |
- | ('Platform version:', 'OpenCL 1.2 AMD-APP (938.2)') | + | Platform version: OpenCL 1.2 CUDA 7.5.20 |
--------------------------------------------------------------- | --------------------------------------------------------------- | ||
- | ('Device name:', 'Intel(R) Core(TM) i7-3615QM CPU @ 2.30GHz') | + | Device name: Quadro 600 |
- | ('Device type:', 'CPU') | + | Device type: GPU |
- | ('Device memory: ', 7942, 'MB') | + | Device memory: 1023 MB |
- | ('Device max clock speed:', 1388, 'MHz') | + | Device max clock speed: 1280 MHz |
- | ('Device compute units:', 8) | + | Device compute units: 2 |
- | Execution time of test: 0.00280508 s | + | Device max work group size: 1024 |
+ | Device max work item sizes: [1024, 1024, 64] | ||
+ | Data points: 8388608 | ||
+ | Workers: 256 | ||
+ | Preferred work group size multiple: 32 | ||
+ | Execution time of test: 0.00468445 s | ||
Results OK | Results OK | ||
</code> | </code> | ||
- | Vous aurez noter que l'installation préalable d'un SDK AMD a permis de disposer d'un support OpenCL sur le processeur. | ||
- | ====== Installation de PyFFT ====== | ||
- | [[http://packages.python.org/pyfft/|PyFFT]] permet l'exploitation de PyCUDA et PyOpenCL pour effectuer tranquillement ses FFT directement à partir de ses scripts Python. | ||
- | Pour son installation, suivre la page : http://www.cbp.ens-lyon.fr/emmanuel.quemener/dokuwiki/doku.php?id=pyfft4squeeze | ||