Ci-dessous, les différences entre deux révisions de la page.
Les deux révisions précédentes Révision précédente Prochaine révision | Révision précédente | ||
formation:astrosim2017gpu4dummies [2017/07/07 14:22] equemene |
formation:astrosim2017gpu4dummies [2017/07/10 18:52] (Version actuelle) equemene [NBody, a simplistic simulator] |
||
---|---|---|---|
Ligne 392: | Ligne 392: | ||
* ''Pi_FP32_MWC_xPU_OpenCL_1_1_1_1_01000000_Device0_InMetro_titan.npz'' | * ''Pi_FP32_MWC_xPU_OpenCL_1_1_1_1_01000000_Device0_InMetro_titan.npz'' | ||
* ''Pi_FP32_MWC_xPU_OpenCL_1_1_1_1_01000000_Device0_InMetro_titan'' | * ''Pi_FP32_MWC_xPU_OpenCL_1_1_1_1_01000000_Device0_InMetro_titan'' | ||
+ | |||
+ | === Exercice #7 : explore ''PiXPU.py'' with several simple configurations pour ''PR=1'' === | ||
+ | |||
+ | * Without any parameters (the default ones) : | ||
+ | * what is the selected device ? How many itops (iterative operations per second) do you reach ? | ||
+ | * With only the device parameter as ''-d 1'' to select ''#1'' for all the available devices : | ||
+ | * What are the different ratios between the devices ? Which one is the most powerful ? | ||
+ | * With the selector of device and increasing the number of iterations and the number of redo : | ||
+ | * What arrive to itops values ? What is the typical variability on results ? | ||
+ | |||
+ | <code>/scratch/$USER/PiXPU.py</code> | ||
+ | |||
+ | <code> | ||
+ | /scratch/$USER/PiXPU.py -d 1 | ||
+ | /scratch/$USER/PiXPU.py -d 2 | ||
+ | /scratch/$USER/PiXPU.py -d 3 | ||
+ | </code> | ||
+ | |||
+ | <code> | ||
+ | /scratch/$USER/PiXPU.py -d 0 -i 100000000 -r 10 | ||
+ | /scratch/$USER/PiXPU.py -d 1 -i 100000000 -r 10 | ||
+ | /scratch/$USER/PiXPU.py -d 2 -i 100000000 -r 10 | ||
+ | /scratch/$USER/PiXPU.py -d 3 -i 100000000 -r 10 | ||
+ | </code> | ||
+ | |||
+ | === Exercice #8 : explore ''PiXPU.py'' by increasing the Parallel Rate ''PR'' === | ||
+ | |||
+ | * With a PR from ''1'' to ''64'' set by ''-b'' and ''-e'', a the number of iterations of 1 billion, and 10 times and on default device | ||
+ | * How decrease the elapsed time of | ||
+ | * With the selector of device and increasing the number of iterations and the number of redo : | ||
+ | * What arrive to itops values ? What is the typical variability on results ? | ||
+ | |||
+ | <code>./PiXPU.py -d 0 -b 1 -e 32 -i 1000000000 -r 10</code> | ||
+ | |||
+ | In this case, we define a gnuplot config file as follow. Adapt to your files and configuration. | ||
+ | <code> | ||
+ | set xlabel 'Parallel Rate' | ||
+ | set ylabel 'Itops' | ||
+ | plot 'Pi_FP32_MWC_xPU_OpenCL_1_64_1_1_1000000000_Device0_InMetro_titan' using 1:9 title 'CPU with OpenCL' | ||
+ | </code> | ||
+ | |||
+ | {{ :formation:pimc_1_64_cpu.png?600 |}} | ||
+ | |||
+ | === Exercice #9 : explore ''PiXPU.py'' with large PR on GPU (mostly power of 2) === | ||
+ | |||
+ | * Explore with ''PR'' from ''2048'' to ''32768'' with a 128 step | ||
+ | * For which ''PR'' the itops is the higher on you device ? | ||
To explore on this platform the GPU device (device #1) from 2048 to 32768 as parallel rates with a step of 128 and 1000000000 iterations: <code> | To explore on this platform the GPU device (device #1) from 2048 to 32768 as parallel rates with a step of 128 and 1000000000 iterations: <code> | ||
Ligne 401: | Ligne 448: | ||
* ''Pi_FP32_MWC_xPU_OpenCL_2048_32768_1_1_1000000000_Device1_InMetro_titan'' | * ''Pi_FP32_MWC_xPU_OpenCL_2048_32768_1_1_1000000000_Device1_InMetro_titan'' | ||
- | In this case, you can define a gnuplot confi file | + | In this case, you can define a gnuplot config file |
<code> | <code> | ||
set xlabel 'Parallel Rate' | set xlabel 'Parallel Rate' | ||
Ligne 410: | Ligne 457: | ||
{{ :formation:pimc_2048_32768_gtx1080ti.png?600 |}} | {{ :formation:pimc_2048_32768_gtx1080ti.png?600 |}} | ||
+ | === Exercice #10 : explore ''PiXPU.py'' with around a large ''PR'' === | ||
+ | |||
+ | <code>./PiXPU.py -d 1 -b $((2048-8)) -e $((2048+8)) -i 10000000000 -r 10</code> | ||
+ | |||
+ | * ''Pi_FP32_MWC_xPU_OpenCL_2040_2056_1_1_10000000000_Device1_InMetro_titan'' | ||
+ | * ''Pi_FP32_MWC_xPU_OpenCL_2040_2056_1_1_10000000000_Device1_InMetro_titan.npz'' | ||
+ | |||
+ | In this case, you can define a gnuplot config file | ||
+ | <code> | ||
+ | set xlabel 'Parallel Rate' | ||
+ | set ylabel 'Itops' | ||
+ | plot 'Pi_FP32_MWC_xPU_OpenCL_2040_2056_1_1_10000000000_Device1_InMetro_titan' using 1:9 title 'GTX 1080 Ti' | ||
+ | </code> | ||
+ | |||
+ | {{ :formation:pimc_2040_2056_gtx1080ti.png?600 |}} | ||
==== NBody, a simplistic simulator ==== | ==== NBody, a simplistic simulator ==== | ||
+ | The ''NBody.py'' code is a implementation of N-Body kepkerian system on OpenCL devices. | ||
+ | |||
+ | It's available on: | ||
+ | * on file: ''/scratch/AstroSim2017/NBody.py'' on workstations | ||
+ | * on website: [[http://www.cbp.ens-lyon.fr/emmanuel.quemener/documents/Astrosim2017/NBody.py|NBody.py]] | ||
+ | |||
+ | Launch the code with a ''N=2'' on ''1000'' iterations with a graphical output | ||
+ | <code> | ||
+ | python NBody.py -n 2 -g -i 1000 | ||
+ | </code> | ||
+ | |||
+ | {{ :formation:nbody_n2_gpu.png?600 |}} | ||
+ | |||
+ | |||
+ | === Exercice #10 : explore ''NBody.py'' with different devices === | ||
+ | |||
+ | === Exercice #11 : explore ''NBody.py'' with steps and iterations === | ||
+ | |||
+ | === Exercice #12 : explore ''NBody.py'' with Double Precision === | ||
===== Exploration with production codes ===== | ===== Exploration with production codes ===== | ||
==== PKDGRAV3 ==== | ==== PKDGRAV3 ==== | ||
+ | |||