Ceci est une ancienne révision du document !
Un premier environnement Tensorflow a été installé au printemps au Centre Blaise Pascal. Il exploite l'environnement Anaconda3-2019.3 installé dans le dossier /opt/anaconda3
Un second environnement Tensorflow a été installé cet automne au Centre Blaise Pascal. Il exploite l'environnement Anaconda3-2019.10 installé dans le dossier /opt/anaconda3-2019.10
Deux environnements Tensorflow sont installés dans deux Anaconda3 différents :
Pour charger l'environnement Tensorflow 1.12 Pour la charge dans le SIDUS standard du CBP:
source /etc/tensorflow.init
Pour charger l'environnement Tensorflow 2.0 Pour la charge dans le SIDUS standard du CBP:
source /etc/tensorflow2.init
Lorsqu'il est activé l'invite de commande est alors préfixée de (base)
. Par exemple, l'utilisateur einstein
sur la machine ascenseur
aura comme invite de commande :
(base) einstein@ascenseur:~$
L'exemple suivant, fourni par le site officiel, permet de rapidement juger du fonctionnement de l'environnement. Il nécessite le lancement de l'interpréteur python :
# Python import tensorflow as tf hello = tf.constant('Hello, TensorFlow!') sess = tf.Session() print(sess.run(hello))
A l'exécution de la troisième ligne, l'environnement détecte les cartes graphiques susceptibles d'être exploitées :
2019-06-11 18:03:17.928752: I tensorflow/core/platform/cpu_feature_guard.cc:141] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX 2019-06-11 18:03:17.938098: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 1995195000 Hz 2019-06-11 18:03:17.938873: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x562f3dba5a30 executing computations on platform Host. Devices: 2019-06-11 18:03:17.938924: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): <undefined>, <undefined> 2019-06-11 18:03:18.167596: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-06-11 18:03:18.203546: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:998] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-06-11 18:03:18.205179: I tensorflow/compiler/xla/service/service.cc:150] XLA service 0x562f3dc62f80 executing computations on platform CUDA. Devices: 2019-06-11 18:03:18.205294: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (0): GeForce RTX 2080, Compute Capability 7.5 2019-06-11 18:03:18.205385: I tensorflow/compiler/xla/service/service.cc:158] StreamExecutor device (1): GeForce GT 730, Compute Capability 3.5 2019-06-11 18:03:18.206798: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 0 with properties: name: GeForce RTX 2080 major: 7 minor: 5 memoryClockRate(GHz): 1.71 pciBusID: 0000:04:00.0 totalMemory: 7.77GiB freeMemory: 7.65GiB 2019-06-11 18:03:18.207245: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1433] Found device 1 with properties: name: GeForce GT 730 major: 3 minor: 5 memoryClockRate(GHz): 0.9015 pciBusID: 0000:03:00.0 totalMemory: 1.95GiB freeMemory: 1.90GiB 2019-06-11 18:03:18.207374: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1497] Ignoring visible gpu device (device: 1, name: GeForce GT 730, pci bus id: 0000:03:00.0, compute capability: 3.5) with Cuda multiprocessor count: 2. The minimum required count is 8. You can adjust this requirement with the env var TF_MIN_GPU_MULTIPROCESSOR_COUNT. 2019-06-11 18:03:18.207456: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1512] Adding visible gpu devices: 0 2019-06-11 18:03:18.210563: I tensorflow/core/common_runtime/gpu/gpu_device.cc:984] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-06-11 18:03:18.210632: I tensorflow/core/common_runtime/gpu/gpu_device.cc:990] 0 1 2019-06-11 18:03:18.210687: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 0: N N 2019-06-11 18:03:18.210734: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1003] 1: N N 2019-06-11 18:03:18.211630: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1115] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 7439 MB memory) -> physical GPU (device: 0, name: GeForce RTX 2080, pci bus id: 0000:04:00.0, compute capability: 7.5)
Dans l'exemple ci-dessus, il s'agit d'une carte d'une carte GeForce GT 730 avec 1.95GiB de RAM et d'une carte GeForce RTX 2080 avec 7.65GiB de RAM
La dernière ligne permet s'assurer que la session fonctionne de manière nominal en affichant :
Hello, TensorFlow!
Une grande variété de tutoriels en ligne permettent de vérifier le bon fonctionnement.
L'exemple précédent ne fonctionne pas : voici un petit exemple qui fonctionne pour tester son tensorflow 2.0
# Python from __future__ import absolute_import, division, print_function, unicode_literals import tensorflow as tf # Create some tensors a = tf.constant([[1.0, 2.0, 3.0], [4.0, 5.0, 6.0]]) b = tf.constant([[1.0, 2.0], [3.0, 4.0], [5.0, 6.0]]) c = tf.matmul(a, b) print(c)
A l'exécution de la troisième ligne, l'environnement détecte les cartes graphiques susceptibles d'être exploitées :
2019-11-04 17:58:31.722039: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1 2019-11-04 17:58:31.749152: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-11-04 17:58:31.749646: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: Tesla V100-PCIE-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.38 pciBusID: 0000:07:00.0 2019-11-04 17:58:31.750304: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 2019-11-04 17:58:31.752157: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 2019-11-04 17:58:31.753851: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0 2019-11-04 17:58:31.755073: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0 2019-11-04 17:58:31.757047: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0 2019-11-04 17:58:31.758713: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0 2019-11-04 17:58:31.762588: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2019-11-04 17:58:31.762748: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-11-04 17:58:31.763286: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-11-04 17:58:31.763714: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 2019-11-04 17:58:31.763967: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: SSE4.1 SSE4.2 AVX AVX2 AVX512F FMA 2019-11-04 17:58:31.768995: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 3192495000 Hz 2019-11-04 17:58:31.769414: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55a533b0ac90 executing computations on platform Host. Devices: 2019-11-04 17:58:31.769443: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Host, Default Version 2019-11-04 17:58:31.769643: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-11-04 17:58:31.770095: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1618] Found device 0 with properties: name: Tesla V100-PCIE-16GB major: 7 minor: 0 memoryClockRate(GHz): 1.38 pciBusID: 0000:07:00.0 2019-11-04 17:58:31.770124: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 2019-11-04 17:58:31.770139: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 2019-11-04 17:58:31.770150: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcufft.so.10.0 2019-11-04 17:58:31.770164: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcurand.so.10.0 2019-11-04 17:58:31.770178: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusolver.so.10.0 2019-11-04 17:58:31.770190: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcusparse.so.10.0 2019-11-04 17:58:31.770202: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudnn.so.7 2019-11-04 17:58:31.770292: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-11-04 17:58:31.770799: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-11-04 17:58:31.771228: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1746] Adding visible gpu devices: 0 2019-11-04 17:58:31.771260: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcudart.so.10.0 2019-11-04 17:58:31.859125: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1159] Device interconnect StreamExecutor with strength 1 edge matrix: 2019-11-04 17:58:31.859173: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1165] 0 2019-11-04 17:58:31.859183: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1178] 0: N 2019-11-04 17:58:31.859388: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-11-04 17:58:31.859930: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-11-04 17:58:31.860424: I tensorflow/stream_executor/cuda/cuda_gpu_executor.cc:1006] successful NUMA node read from SysFS had negative value (-1), but there must be at least one NUMA node, so returning NUMA node zero 2019-11-04 17:58:31.860871: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1304] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 14961 MB memory) -> physical GPU (device: 0, name: Tesla V100-PCIE-16GB, pci bus id: 0000:07:00.0, compute capability: 7.0) 2019-11-04 17:58:31.862849: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x55a534edf7d0 executing computations on platform CUDA. Devices: 2019-11-04 17:58:31.862884: I tensorflow/compiler/xla/service/service.cc:175] StreamExecutor device (0): Tesla V100-PCIE-16GB, Compute Capability 7.0 2019-11-04 17:58:31.863918: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcublas.so.10.0 tf.Tensor( [[22. 28.] [49. 64.]], shape=(2, 2), dtype=float32)
On reconnait la carte Tesla V100-PCIE-16GB détectée.
Il se peut que dans l'utilisation, des plantages apparaissent avec comme source de première erreur CUDNN_STATUS_INTERNAL_ERROR
2019-09-28 05:35:09.764756: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR 2019-09-28 05:35:09.766851: E tensorflow/stream_executor/cuda/cuda_dnn.cc:334] Could not create cudnn handle: CUDNN_STATUS_INTERNAL_ERROR --- tensorflow.python.framework.errors_impl.UnknownError: Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above. ---
Cette erreur aura cela de désarmant qu'elle ne va pas apparaître sur toutes les machines exploitables avec TensorFlow.
Une solution peut être de définir une variable d'environnement $TF_FORCE_GPU_ALLOW_GROWTH
à true
export TF_FORCE_GPU_ALLOW_GROWTH=true
Cette option (comme son nom l'indique) permet au GPU de conserver la mémoire déjà allouée. Le souci, c'est que cette allocation ne prend fin que lorsque l'exécution est terminée.