Ci-dessous, les différences entre deux révisions de la page.
Les deux révisions précédentes Révision précédente Prochaine révision | Révision précédente | ||
formation:astrosim2017para4dummies [2017/06/30 12:15] equemene [Select the execution cores] |
formation:astrosim2017para4dummies [2017/07/07 09:25] (Version actuelle) equemene [5 W/2H : Why ? What ? Where ? When ? Who ? How much ? How ?] |
||
---|---|---|---|
Ligne 13: | Ligne 13: | ||
* **Where ?** On workstations, cluster nodes, laptop (well configured), inside terminals | * **Where ?** On workstations, cluster nodes, laptop (well configured), inside terminals | ||
* **Who ?** For people who want to open the hood | * **Who ?** For people who want to open the hood | ||
- | * **How ?** Applying some simples commands (essentially shell ones) | + | * **How ?** Applying some simple commands (essentially shell ones) |
===== Session Goal ===== | ===== Session Goal ===== | ||
Ligne 134: | Ligne 134: | ||
</code> | </code> | ||
- | === Question #1: get this informations on your host with ''cat /proc/cpuinfo'' and compare to one above === | + | === Exercice #1: get this informations on your host with ''cat /proc/cpuinfo'' and compare to one above === |
* How much lines of informations ? | * How much lines of informations ? | ||
- | === Question #2 : get the informations on your host with ''lscpu'' command === | + | === Exercice #2 : get the informations on your host with ''lscpu'' command === |
* What new informations appear on the output ? | * What new informations appear on the output ? | ||
Ligne 150: | Ligne 150: | ||
{{ :formation:lstopo_035.png?400 |hwloc-ls}} | {{ :formation:lstopo_035.png?400 |hwloc-ls}} | ||
- | === Question #3 : get a graphical representation of hardware with ''hwloc-ls'' command === | + | === Exercice #3 : get a graphical representation of hardware with ''hwloc-ls'' command === |
* Locate and identify the elements provided with ''lscpu'' command | * Locate and identify the elements provided with ''lscpu'' command | ||
Ligne 178: | Ligne 178: | ||
</code> | </code> | ||
- | === Question #4 : list the PCI peripherals with ''lspci'' command === | + | === Exercice #4 : list the PCI peripherals with ''lspci'' command === |
* How many devices do you get ? | * How many devices do you get ? | ||
Ligne 190: | Ligne 190: | ||
As when your drive a car, it's useful to get informations about running system during process. The commands ''top'' and ''htop'' | As when your drive a car, it's useful to get informations about running system during process. The commands ''top'' and ''htop'' | ||
- | === Question #5: open ''htop'' and ''top'' in two terminals === | + | === Exercice #5: open ''htop'' and ''top'' in two terminals === |
* What do you see first ? | * What do you see first ? | ||
Ligne 283: | Ligne 283: | ||
</code> | </code> | ||
- | === Question #6 : exploration of ''/usr/bin/time'' on several command Unix commands ''ls, cp, === | + | === Exercice #6 : exploration of ''/usr/bin/time'' on several command Unix commands or your small programs === |
Ligne 343: | Ligne 343: | ||
</code> | </code> | ||
- | === Question #7 : practice ''Rmmmms-$USER.r'' and investigate variability === | + | === Exercice #7 : practice ''Rmmmms-$USER.r'' and investigate variability === |
* Launch previous command to 10000, 1000, 100 launchs with respectly sizes of 10, 100, 1000 | * Launch previous command to 10000, 1000, 100 launchs with respectly sizes of 10, 100, 1000 | ||
Ligne 405: | Ligne 405: | ||
A program name ''PiMC-$USER.sh'' located in ''/tmp'' where ''$USER'' is your login is created and ready to use. | A program name ''PiMC-$USER.sh'' located in ''/tmp'' where ''$USER'' is your login is created and ready to use. | ||
- | === Question #8: launch ''PiMC'' program with several number of iterations: from 100 to 1000000 === | + | === Exercice #8: launch ''PiMC'' program with several number of iterations: from 100 to 1000000 === |
* What is the typical precision of the result ? | * What is the typical precision of the result ? | ||
- | === Question #9: launch ''PiMC'' program prefixed by ''/usr/bin/time'' with several number of iterations: 100 to 1000000 === | + | === Exercice #9: launch ''PiMC'' program prefixed by ''/usr/bin/time'' with several number of iterations: 100 to 1000000 === |
* Grep the ''Elapsed'' and ''Iterations'' and estimate manually the **ITOPS** (ITerative Operations Per Second) for this program implementation | * Grep the ''Elapsed'' and ''Iterations'' and estimate manually the **ITOPS** (ITerative Operations Per Second) for this program implementation | ||
Ligne 415: | Ligne 415: | ||
One Solution:<code> | One Solution:<code> | ||
- | echo $(/usr/bin/time /tmp/PiMC-jmylq.sh 100000 2>&1 | egrep '(Elapsed|Iterations)' | awk '{ print $NF }' | tr '\n' '/')1 | bc -l | + | echo $(/usr/bin/time /tmp/PiMC-$USER.sh 100000 2>&1 | egrep '(Elapsed|Iterations)' | awk '{ print $NF }' | tr '\n' '/')1 | bc -l |
</code> | </code> | ||
Ligne 431: | Ligne 431: | ||
32362.45954692556634304207 | 32362.45954692556634304207 | ||
</code> | </code> | ||
+ | |||
+ | Example of code for previous results:<code> | ||
+ | for i in $(seq 10 ) ; do echo $(/usr/bin/time /tmp/PiMC-$USER.sh 100000 2>&1 | egrep '(Elapsed|Iterations)' | awk '{ print $NF }' | tr '\n' '/')1 | bc -l ; done</code> | ||
From 1000 to 1000000, 1 time: | From 1000 to 1000000, 1 time: | ||
Ligne 440: | Ligne 443: | ||
</code> | </code> | ||
+ | Example of code for previous results:<code> | ||
+ | for POWER in $(seq 3 1 6); do ITERATIONS=$((10**$POWER)) ; echo -ne $ITERATIONS'\t' ; echo $(/usr/bin/time /tmp/PiMC-$USER.sh $ITERATIONS 2>&1 | egrep '(Elapsed|Iterations)' | awk '{ print $NF }' | tr '\n' '/')1 | bc -l ; done</code> | ||
==== Split the execution in equal parts ==== | ==== Split the execution in equal parts ==== | ||
Ligne 465: | Ligne 470: | ||
On the previous launch, User time represents 99.6% of Elapsed time. Internal system operations only 0.4%. | On the previous launch, User time represents 99.6% of Elapsed time. Internal system operations only 0.4%. | ||
- | === Question #10 : identification of the cost of splitting process === | + | === Exercice #10 : identification of the cost of splitting process === |
* Explore the values of ''User'', ''System'' and ''Elapsed'' times for different values of iterations | * Explore the values of ''User'', ''System'' and ''Elapsed'' times for different values of iterations | ||
Ligne 585: | Ligne 590: | ||
In this example, we see that the User time represents 98.52% of the Elapsed time. The total Elapsed time is greater up to 10% to unsplitted one. So, splitting has a cost. The system time represents 0.4% of Elapsed time. | In this example, we see that the User time represents 98.52% of the Elapsed time. The total Elapsed time is greater up to 10% to unsplitted one. So, splitting has a cost. The system time represents 0.4% of Elapsed time. | ||
- | === Question #11 : identification of the cost of splitting process === | + | === Exercice #11 : identification of the cost of splitting process === |
* Explore the values of ''User'', ''System'' and ''Elapsed'' times for different values of iterations | * Explore the values of ''User'', ''System'' and ''Elapsed'' times for different values of iterations | ||
Ligne 592: | Ligne 597: | ||
* What could you conclude ? | * What could you conclude ? | ||
- | === Question #12 : merging results & improve metrology === | + | === Exercice #12 : merging results & improve metrology === |
* Append the program to extract the total amount of //Inside// number of iterations | * Append the program to extract the total amount of //Inside// number of iterations | ||
Ligne 726: | Ligne 731: | ||
In conclusion, splitting a huge job into small jobs has a Operating System cost. But distribute the jobs using system can very efficient to reduce Elapsed time. | In conclusion, splitting a huge job into small jobs has a Operating System cost. But distribute the jobs using system can very efficient to reduce Elapsed time. | ||
- | === Question #13 : launch with ''-P'' set with the number of CPU detected === | + | === Exercice #13 : launch with ''-P'' set with the number of CPU detected === |
* Examine the ''Elapsed time'': decrease or not ? | * Examine the ''Elapsed time'': decrease or not ? | ||
Ligne 732: | Ligne 737: | ||
* Examine the ''System time'': increase or not ? | * Examine the ''System time'': increase or not ? | ||
- | === Question #14 : append the program to improve statistics === | + | === Exercice #14 : append the program to improve statistics === |
* Add iterator to redo the program 10 times | * Add iterator to redo the program 10 times | ||
Ligne 789: | Ligne 794: | ||
Examples of statistics on estimators: | Examples of statistics on estimators: | ||
With //magic// ''Rmmmms-$USER.r'' command, we can extract statistics on different times | With //magic// ''Rmmmms-$USER.r'' command, we can extract statistics on different times | ||
- | * for //Elapsed time// : ''cat /tmp/PiMC-jmylq_201706291231.log | grep Elapsed | awk '{ print $NF }' | /tmp/Rmmmms-$USER.r'':<code>1.96 2.02 1.985 1.987 0.01888562 0.009514167</code> | + | * for //Elapsed time// : ''cat /tmp/PiMC-$USER_201706291231.log | grep Elapsed | awk '{ print $NF }' | /tmp/Rmmmms-$USER.r'':<code>1.96 2.02 1.985 1.987 0.01888562 0.009514167</code> |
- | * for //System time// : ''cat /tmp/PiMC-jmylq_201706291231.log | grep System | awk '{ print $NF }' | /tmp/Rmmmms-$USER.r'':<code>0.09 0.22 0.14 0.139 0.03665151 0.2617965</code> | + | * for //System time// : ''cat /tmp/PiMC-$USER_201706291231.log | grep System | awk '{ print $NF }' | /tmp/Rmmmms-$USER.r'':<code>0.09 0.22 0.14 0.139 0.03665151 0.2617965</code> |
- | * for //User time// : ''cat /tmp/PiMC-jmylq_201706291231.log | grep User | awk '{ print $NF }' | /tmp/Rmmmms-$USER.r'':<code>59.12 59.81 59.375 59.436 0.2179297 0.003670394</code> | + | * for //User time// : ''cat /tmp/PiMC-$USER_201706291231.log | grep User | awk '{ print $NF }' | /tmp/Rmmmms-$USER.r'':<code>59.12 59.81 59.375 59.436 0.2179297 0.003670394</code> |
The previous results show that the variability, in this cas, in | The previous results show that the variability, in this cas, in | ||
Ligne 823: | Ligne 828: | ||
<note important>You can control the selection by watching in another terminal the ''htop'' activity of cores</note> | <note important>You can control the selection by watching in another terminal the ''htop'' activity of cores</note> | ||
- | === Question #15 : launch the previous program on a slice of machine === | + | === Exercice #15 : launch the previous program on a slice of machine === |
* Identify and launch the program on only the first core | * Identify and launch the program on only the first core | ||
Ligne 831: | Ligne 836: | ||
* Identify and launch on first on the first half and first on the second half of cores | * Identify and launch on first on the first half and first on the second half of cores | ||
* Why is there a so great difference between elapsed time | * Why is there a so great difference between elapsed time | ||
+ | |||
+ | Watch inside terminal with ''htop'' to check the right distribution of tasks. | ||
Solutions for a 32 cores workstation: | Solutions for a 32 cores workstation: | ||
Ligne 849: | Ligne 856: | ||
</code> | </code> | ||
- | Why to much user time | + | === Exercice #17 : from exploration to laws estimation === |
- | + | ||
- | HT Effect : why so much people desactivate... | + | |
- | + | ||
- | === Question #16 : from exploration to laws estimation === | + | |
* explore with previous program from ''PR=1'' to ''PR=<2x CPU>'', 10x for each | * explore with previous program from ''PR=1'' to ''PR=<2x CPU>'', 10x for each | ||
Ligne 946: | Ligne 949: | ||
</code> | </code> | ||
- | === Question #16 : plot & fit with Amdahl and Mylq laws === | + | === Question #18 : plot & fit with Amdahl and Mylq laws === |
* plot the curve with your favorite plotter the different values, focus on median one ! | * plot the curve with your favorite plotter the different values, focus on median one ! | ||
Ligne 966: | Ligne 969: | ||
Tm(1)/Tm(x) title "Mylq Law" with lines,\ | Tm(1)/Tm(x) title "Mylq Law" with lines,\ | ||
Ta(1)/Ta(x) title "Amdahl Law" with lines | Ta(1)/Ta(x) title "Amdahl Law" with lines | ||
- | pause -1 | ||
</code> | </code> | ||
- | xGEMM | + | {{ :formation:pimc_1_64.png?600 |}} |
- | NBody.py | + | ==== Other sample codes (used for courses) ==== |
- | PiXPU.py | + | |
- | Choose your prefered parallel code | + | In folder ''/scratch/AstroSim2017'', you will find the following executables: |
+ | * ''PiXPU.py'' : Pi Monte Carlo Dart Dash in PyOpenCL | ||
+ | * ''NBody.py'' : N-Body in PyOpenCL | ||
+ | * ''xGEMM_DP_openblas'' : Matrix-Matrix multiplication with multithreaded OpenBLAS library in double precision | ||
+ | * ''xGEMM_SP_openblas'' : Matrix-Matrix multiplication with multithreaded OpenBLAS library in simple precision | ||
+ | * ''xGEMM_DP_clblas'' : Matrix-Matrix multiplication for OpenCL library in double precision | ||
+ | * ''xGEMM_SP_clblas'' : Matrix-Matrix multiplication for OpenCL library in simple precision | ||
+ | * ''xGEMM_DP_cublas'' : Matrix-Matrix multiplication for CUDA library in double precision | ||
+ | * ''xGEMM_SP_cublas'' : Matrix-Matrix multiplication for CUDA library in simple precision | ||
- | Improvment of statistics | + | === Exercice #19 : select parallelized program and explore salability === |
- | Scalability law | + | * launch one of the upper code with ''PR'' from ''1'' to the 2 times the number of CPUs |
+ | * draw the scalability curve | ||
+ | * estimates the parameters with Amdahl Law and Mylq Law | ||
- | Amdahl Law | + | ==== Your prefered software ==== |
- | Mylq Law | + | === Exercice #20 : select parallelized program and explore salability === |
+ | * launch your MPI code with ''PR'' from ''1'' to the 2 times the number of CPUs | ||
+ | * draw the scalability curve | ||
+ | * estimates the parameters with Amdahl Law and Mylq Law | ||
+ | --- //[[emmanuel.quemener@ens-lyon.fr|Emmanuel Quemener]] 2017/06/30 14:26// |