Différences

Ci-dessous, les différences entre deux révisions de la page.

Lien vers cette vue comparative

Les deux révisions précédentes Révision précédente
Prochaine révision
Révision précédente
formation:astrosim2017para4dummies [2017/06/30 08:20]
equemene
formation:astrosim2017para4dummies [2017/07/07 09:25] (Version actuelle)
equemene [5 W/2H : Why ? What ? Where ? When ? Who ? How much ? How ?]
Ligne 13: Ligne 13:
   * **Where ?** On workstations,​ cluster nodes, laptop (well configured),​ inside terminals   * **Where ?** On workstations,​ cluster nodes, laptop (well configured),​ inside terminals
   * **Who ?** For people who want to open the hood    * **Who ?** For people who want to open the hood 
-  * **How ?** Applying some simples ​commands (essentially shell ones)+  * **How ?** Applying some simple ​commands (essentially shell ones)
  
 ===== Session Goal ===== ===== Session Goal =====
Ligne 134: Ligne 134:
 </​code>​ </​code>​
  
-=== Question ​#1: get this informations on your host with ''​cat /​proc/​cpuinfo''​ and compare to one above ===+=== Exercice ​#1: get this informations on your host with ''​cat /​proc/​cpuinfo''​ and compare to one above ===
  
   * How much lines of informations ?   * How much lines of informations ?
  
-=== Question ​#2 : get the informations on your host with ''​lscpu''​ command ===+=== Exercice ​#2 : get the informations on your host with ''​lscpu''​ command ===
  
   * What new informations appear on the output ?    * What new informations appear on the output ? 
Ligne 150: Ligne 150:
 {{ :​formation:​lstopo_035.png?​400 |hwloc-ls}} {{ :​formation:​lstopo_035.png?​400 |hwloc-ls}}
  
-=== Question ​#3 : get a graphical representation of hardware with ''​hwloc-ls''​ command ===+=== Exercice ​#3 : get a graphical representation of hardware with ''​hwloc-ls''​ command ===
  
   * Locate and identify the elements provided with ''​lscpu''​ command   * Locate and identify the elements provided with ''​lscpu''​ command
Ligne 178: Ligne 178:
 </​code> ​ </​code> ​
  
-=== Question ​#4 : list the PCI peripherals with ''​lspci''​ command ===+=== Exercice ​#4 : list the PCI peripherals with ''​lspci''​ command ===
  
   * How many devices do you get ?   * How many devices do you get ?
Ligne 190: Ligne 190:
 As when your drive a car, it's useful to get informations about running system during process. The commands ''​top''​ and ''​htop''​ As when your drive a car, it's useful to get informations about running system during process. The commands ''​top''​ and ''​htop''​
  
-=== Question ​#5: open ''​htop''​ and ''​top''​ in two terminals ===+=== Exercice ​#5: open ''​htop''​ and ''​top''​ in two terminals ===
  
   * What do you see first ?   * What do you see first ?
Ligne 283: Ligne 283:
 </​code>​ </​code>​
  
-=== Question ​#6 : exploration of ''/​usr/​bin/​time''​ on several command Unix commands ​''​ls,​ cp,  ​===+=== Exercice ​#6 : exploration of ''/​usr/​bin/​time''​ on several command Unix commands ​or your small programs ​===
  
  
Ligne 343: Ligne 343:
 </​code>​ </​code>​
  
-=== Question ​#7 : practice ''​Rmmmms-$USER.r''​ and investigate variability ===+=== Exercice ​#7 : practice ''​Rmmmms-$USER.r''​ and investigate variability ===
  
   * Launch previous command to 10000, 1000, 100 launchs with respectly sizes of 10, 100, 1000   * Launch previous command to 10000, 1000, 100 launchs with respectly sizes of 10, 100, 1000
Ligne 405: Ligne 405:
 A program name ''​PiMC-$USER.sh''​ located in ''/​tmp''​ where ''​$USER''​ is your login is created and ready to use. A program name ''​PiMC-$USER.sh''​ located in ''/​tmp''​ where ''​$USER''​ is your login is created and ready to use.
  
-=== Question ​#8: launch ''​PiMC''​ program with several number of iterations: from 100 to 1000000 ===+=== Exercice ​#8: launch ''​PiMC''​ program with several number of iterations: from 100 to 1000000 ===
  
   * What is the typical precision of the result ?   * What is the typical precision of the result ?
  
-=== Question ​#9: launch ''​PiMC''​ program prefixed by ''/​usr/​bin/​time''​ with several number of iterations: 100 to 1000000 ===+=== Exercice ​#9: launch ''​PiMC''​ program prefixed by ''/​usr/​bin/​time''​ with several number of iterations: 100 to 1000000 ===
  
   * Grep the ''​Elapsed''​ and ''​Iterations''​ and estimate manually the **ITOPS** (ITerative Operations Per Second) for this program implementation   * Grep the ''​Elapsed''​ and ''​Iterations''​ and estimate manually the **ITOPS** (ITerative Operations Per Second) for this program implementation
Ligne 415: Ligne 415:
  
 One Solution:<​code>​ One Solution:<​code>​
-echo $(/​usr/​bin/​time /tmp/PiMC-jmylq.sh 100000 2>&1 | egrep '​(Elapsed|Iterations)'​ | awk '{ print $NF }' | tr '​\n'​ '/'​)1 | bc -l+echo $(/​usr/​bin/​time /tmp/PiMC-$USER.sh 100000 2>&1 | egrep '​(Elapsed|Iterations)'​ | awk '{ print $NF }' | tr '​\n'​ '/'​)1 | bc -l
 </​code>​ </​code>​
  
Ligne 431: Ligne 431:
 32362.45954692556634304207 32362.45954692556634304207
 </​code>​ </​code>​
 +
 +Example of code for previous results:<​code>​
 +for i in $(seq 10 ) ; do echo $(/​usr/​bin/​time /​tmp/​PiMC-$USER.sh 100000 2>&1 | egrep '​(Elapsed|Iterations)'​ | awk '{ print $NF }' | tr '​\n'​ '/'​)1 | bc -l ; done</​code>​
  
 From 1000 to 1000000, 1 time: From 1000 to 1000000, 1 time:
Ligne 440: Ligne 443:
 </​code>​ </​code>​
  
 +Example of code for previous results:<​code>​
 +for POWER in $(seq 3 1 6); do ITERATIONS=$((10**$POWER)) ; echo -ne $ITERATIONS'​\t'​ ; echo $(/​usr/​bin/​time /​tmp/​PiMC-$USER.sh $ITERATIONS 2>&1 | egrep '​(Elapsed|Iterations)'​ | awk '{ print $NF }' | tr '​\n'​ '/'​)1 | bc -l ; done</​code>​
 ==== Split the execution in equal parts ==== ==== Split the execution in equal parts ====
  
Ligne 465: Ligne 470:
 On the previous launch, User time represents 99.6% of Elapsed time. Internal system operations only 0.4%. On the previous launch, User time represents 99.6% of Elapsed time. Internal system operations only 0.4%.
  
-=== Question ​#10 : identification of the cost of splitting process ===+=== Exercice ​#10 : identification of the cost of splitting process ===
  
   * Explore the values of ''​User'',​ ''​System''​ and ''​Elapsed''​ times for different values of iterations   * Explore the values of ''​User'',​ ''​System''​ and ''​Elapsed''​ times for different values of iterations
Ligne 585: Ligne 590:
 In this example, we see that the User time represents 98.52% of the Elapsed time. The total Elapsed time is greater up to 10% to unsplitted one. So, splitting has a cost. The system time represents 0.4% of Elapsed time. In this example, we see that the User time represents 98.52% of the Elapsed time. The total Elapsed time is greater up to 10% to unsplitted one. So, splitting has a cost. The system time represents 0.4% of Elapsed time.
  
-=== Question ​#11 : identification of the cost of splitting process ===+=== Exercice ​#11 : identification of the cost of splitting process ===
  
   * Explore the values of ''​User'',​ ''​System''​ and ''​Elapsed''​ times for different values of iterations   * Explore the values of ''​User'',​ ''​System''​ and ''​Elapsed''​ times for different values of iterations
Ligne 592: Ligne 597:
   * What could you conclude ?   * What could you conclude ?
  
-=== Question ​#12 : merging results & improve metrology ===+=== Exercice ​#12 : merging results & improve metrology ===
  
   * Append the program to extract the total amount of //Inside// number of iterations   * Append the program to extract the total amount of //Inside// number of iterations
Ligne 726: Ligne 731:
 In conclusion, splitting a huge job into small jobs has a Operating System cost. But distribute the jobs using system can very efficient to reduce Elapsed time. In conclusion, splitting a huge job into small jobs has a Operating System cost. But distribute the jobs using system can very efficient to reduce Elapsed time.
  
-=== Question ​#13 : launch with ''​-P''​ set with the number of CPU detected ===+=== Exercice ​#13 : launch with ''​-P''​ set with the number of CPU detected ===
  
   * Examine the ''​Elapsed time'':​ decrease or not ?   * Examine the ''​Elapsed time'':​ decrease or not ?
Ligne 732: Ligne 737:
   * Examine the ''​System time'':​ increase or not ?   * Examine the ''​System time'':​ increase or not ?
  
- +=== Exercice ​#14 : append the program to improve statistics ===
- +
-=== Question ​#14 : append the program to improve statistics ===+
  
   * Add iterator to redo the program 10 times   * Add iterator to redo the program 10 times
Ligne 791: Ligne 794:
 Examples of statistics on estimators: Examples of statistics on estimators:
 With //magic// ''​Rmmmms-$USER.r''​ command, we can extract statistics on different times  With //magic// ''​Rmmmms-$USER.r''​ command, we can extract statistics on different times 
-  * for //Elapsed time// : ''​cat /tmp/PiMC-jmylq_201706291231.log | grep Elapsed | awk '{ print $NF }' | /​tmp/​Rmmmms-$USER.r'':<​code>​1.96 2.02 1.985 1.987 0.01888562 0.009514167</​code>​ +  * for //Elapsed time// : ''​cat /tmp/PiMC-$USER_201706291231.log | grep Elapsed | awk '{ print $NF }' | /​tmp/​Rmmmms-$USER.r'':<​code>​1.96 2.02 1.985 1.987 0.01888562 0.009514167</​code>​ 
-  * for //System time// : ''​cat /tmp/PiMC-jmylq_201706291231.log | grep System | awk '{ print $NF }' | /​tmp/​Rmmmms-$USER.r'':<​code>​0.09 0.22 0.14 0.139 0.03665151 0.2617965</​code>​ +  * for //System time// : ''​cat /tmp/PiMC-$USER_201706291231.log | grep System | awk '{ print $NF }' | /​tmp/​Rmmmms-$USER.r'':<​code>​0.09 0.22 0.14 0.139 0.03665151 0.2617965</​code>​ 
-  * for //User time// : ''​cat /tmp/PiMC-jmylq_201706291231.log | grep User | awk '{ print $NF }' | /​tmp/​Rmmmms-$USER.r'':<​code>​59.12 59.81 59.375 59.436 0.2179297 0.003670394</​code>​+  * for //User time// : ''​cat /tmp/PiMC-$USER_201706291231.log | grep User | awk '{ print $NF }' | /​tmp/​Rmmmms-$USER.r'':<​code>​59.12 59.81 59.375 59.436 0.2179297 0.003670394</​code>​
  
 The previous results show that the variability,​ in this cas, in  The previous results show that the variability,​ in this cas, in 
Ligne 825: Ligne 828:
 <note important>​You can control the selection by watching in another terminal the ''​htop''​ activity of cores</​note>​ <note important>​You can control the selection by watching in another terminal the ''​htop''​ activity of cores</​note>​
  
-=== Question ​#: launch the previous program on a slice of machine ===+=== Exercice ​#15 : launch the previous program on a slice of machine ===
  
   * Identify and launch the program on only the first core   * Identify and launch the program on only the first core
   * Identify and launch the program on the first half of cores   * Identify and launch the program on the first half of cores
   * Identify and launch the program on the second half of cores   * Identify and launch the program on the second half of cores
 +  * Identify and launch on two first cores
   * Identify and launch on first on the first half and first on the second half of cores   * Identify and launch on first on the first half and first on the second half of cores
 +  * Why is there a so great difference between elapsed time 
 +
 +Watch inside terminal with ''​htop''​ to check the right distribution of tasks.
  
 Solutions for a 32 cores workstation:​ Solutions for a 32 cores workstation:​
- +  * On the first core: 0 <​code>​
-On the first core: 0 <​code>​+
 ITERATIONS=10000000 ; PR=$(($(lscpu | grep '​^CPU(s):'​ | awk '{ print $NF }'​)/​2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/​$PR)) || echo $(($ITERATIONS/​$PR+1))) ; seq $PR | /​usr/​bin/​time hwloc-bind -p pu:0-0 xargs -I '​{}'​ -P $PR /​tmp/​PiMC-$USER.sh $EACHJOB '​{}'​ 2>&1 | grep -v timed | egrep '​(Pi|Inside|Iterations|time)'​ ITERATIONS=10000000 ; PR=$(($(lscpu | grep '​^CPU(s):'​ | awk '{ print $NF }'​)/​2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/​$PR)) || echo $(($ITERATIONS/​$PR+1))) ; seq $PR | /​usr/​bin/​time hwloc-bind -p pu:0-0 xargs -I '​{}'​ -P $PR /​tmp/​PiMC-$USER.sh $EACHJOB '​{}'​ 2>&1 | grep -v timed | egrep '​(Pi|Inside|Iterations|time)'​
 </​code>​ </​code>​
- +  * On the first half of cores: 0 to 15<​code>​
-On the first half of cores: 0 to 15<​code>​+
 ITERATIONS=10000000 ; PR=$(($(lscpu | grep '​^CPU(s):'​ | awk '{ print $NF }'​)/​2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/​$PR)) || echo $(($ITERATIONS/​$PR+1))) ; seq $PR | /​usr/​bin/​time hwloc-bind -p pu:0-15 xargs -I '​{}'​ -P $PR /​tmp/​PiMC-$USER.sh $EACHJOB '​{}'​ 2>&1 | grep -v timed | egrep '​(Pi|Inside|Iterations|time)'​ ITERATIONS=10000000 ; PR=$(($(lscpu | grep '​^CPU(s):'​ | awk '{ print $NF }'​)/​2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/​$PR)) || echo $(($ITERATIONS/​$PR+1))) ; seq $PR | /​usr/​bin/​time hwloc-bind -p pu:0-15 xargs -I '​{}'​ -P $PR /​tmp/​PiMC-$USER.sh $EACHJOB '​{}'​ 2>&1 | grep -v timed | egrep '​(Pi|Inside|Iterations|time)'​
 </​code>​ </​code>​
- +  * On the second half of cores: 16 to 31<​code>​
-On the second half of cores: 16 to 31<​code>​+
 ITERATIONS=10000000 ; PR=$(($(lscpu | grep '​^CPU(s):'​ | awk '{ print $NF }'​)*2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/​$PR)) || echo $(($ITERATIONS/​$PR+1))) ; seq $PR | /​usr/​bin/​time hwloc-bind -p pu:16-31 xargs -I '​{}'​ -P $PR /​tmp/​PiMC-$USER.sh $EACHJOB '​{}'​ 2>&1 | grep -v timed | egrep '​(Pi|Inside|Iterations|time)'​ ITERATIONS=10000000 ; PR=$(($(lscpu | grep '​^CPU(s):'​ | awk '{ print $NF }'​)*2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/​$PR)) || echo $(($ITERATIONS/​$PR+1))) ; seq $PR | /​usr/​bin/​time hwloc-bind -p pu:16-31 xargs -I '​{}'​ -P $PR /​tmp/​PiMC-$USER.sh $EACHJOB '​{}'​ 2>&1 | grep -v timed | egrep '​(Pi|Inside|Iterations|time)'​
 </​code>​ </​code>​
- +  * On the first of first half and first of second half of cores: 0 and 8<​code>​ 
-On the first of first half and first of second half of cores: 0 and 8<​code>​+ITERATIONS=10000000 ; PR=$(($(lscpu | grep '​^CPU(s):'​ | awk '{ print $NF }'​)*2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/​$PR)) || echo $(($ITERATIONS/​$PR+1))) ; seq $PR | /​usr/​bin/​time hwloc-bind -p pu:0-1 xargs -I '​{}'​ -P $PR /​tmp/​PiMC-$USER.sh $EACHJOB '​{}'​ 2>&1 | grep -v timed | egrep '​(Pi|Inside|Iterations|time)'​ 
 +</​code>​ 
 +  * On the first of first half and first of second half of cores: 0 and 8<​code>​
 ITERATIONS=10000000 ; PR=$(($(lscpu | grep '​^CPU(s):'​ | awk '{ print $NF }'​)*2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/​$PR)) || echo $(($ITERATIONS/​$PR+1))) ; seq $PR | /​usr/​bin/​time hwloc-bind -p pu:0-0 pu:8-8 xargs -I '​{}'​ -P $PR /​tmp/​PiMC-$USER.sh $EACHJOB '​{}'​ 2>&1 | grep -v timed | egrep '​(Pi|Inside|Iterations|time)'​ ITERATIONS=10000000 ; PR=$(($(lscpu | grep '​^CPU(s):'​ | awk '{ print $NF }'​)*2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/​$PR)) || echo $(($ITERATIONS/​$PR+1))) ; seq $PR | /​usr/​bin/​time hwloc-bind -p pu:0-0 pu:8-8 xargs -I '​{}'​ -P $PR /​tmp/​PiMC-$USER.sh $EACHJOB '​{}'​ 2>&1 | grep -v timed | egrep '​(Pi|Inside|Iterations|time)'​
 </​code>​ </​code>​
  
-Why to much user time+=== Exercice #17 : from exploration ​to laws estimation ===
  
-HT Effect : why so much people desactivate... +  * explore with previous program ​from ''​PR=1'' ​to ''​PR=<2x CPU>'',​ 10x for each 
- +  * store the results in a file
-Scalability exploration ​from PR=1 to PR=2x CPU+
  
 +Solution:
 <​code>​ <​code>​
 ITERATIONS=1000000 ; ITERATIONS=1000000 ;
Ligne 943: Ligne 949:
 </​code>​ </​code>​
  
-Examples of codes+=== Question #18 : plot & fit with Amdahl and Mylq laws === 
 + 
 +  * plot the curve with your favorite plotter the different values, focus on median one ! 
 +  * fit with an Amdahl law where ''​T=s+p/​N''​ where ''​N''​ is ''​PR''​ 
 +  * fit with a Mylq law where ''​T=s+c*N+p/​N''​  
 +  * what law match the best 
 + 
 +Examples of gnuplot bunch of commands to do the job. Adapt to your file and ''​PR''​... 
 +<​code>​ 
 +Ta(x)=T1*(1-Pa+Pa/​x) 
 +fit [x=1:16] Ta(x) '​PiMC_1_64.dat'​ using 1:4 via T1,Pa 
 +Tm(x)=Sm+Cm*x+Pm/​x 
 +fit [x=1:16] Tm(x) '​PiMC_1_64.dat'​ using 1:4 via Sm,Cm,Pm 
 +set xlabel '​Parallel Rate'​ 
 +set xrange [1:64] 
 +set ylabel "​Speedup Factor"​ 
 +set title "PiMC : parallel execution with Bash for distributed iterations"​ 
 +plot    '​PiMC_1_64.dat'​ using ($1):​(Tm(1)/​$4) title '​Mesures'​ with points,\ 
 + Tm(1)/Tm(x) title "Mylq Law" with lines,\ 
 +  Ta(1)/Ta(x) title "​Amdahl Law" with lines 
 +</​code>​
  
-xGEMM  +{{ :​formation:​pimc_1_64.png?600 |}} 
-NBody.py +==== Other sample codes (used for courses) ====
-PiXPU.py+
  
-Choose your prefered parallel code+In folder ''/​scratch/​AstroSim2017'',​ you will find the following executables:​ 
 +  * ''​PiXPU.py''​ : Pi Monte Carlo Dart Dash in PyOpenCL 
 +  * ''​NBody.py''​ : N-Body in PyOpenCL 
 +  * ''​xGEMM_DP_openblas''​ : Matrix-Matrix multiplication with multithreaded OpenBLAS library in double precision 
 +  * ''​xGEMM_SP_openblas''​ : Matrix-Matrix multiplication with multithreaded OpenBLAS library in simple precision 
 +  * ''​xGEMM_DP_clblas''​ : Matrix-Matrix multiplication for OpenCL library in double precision 
 +  * ''​xGEMM_SP_clblas''​ : Matrix-Matrix multiplication for OpenCL library in simple precision 
 +  * ''​xGEMM_DP_cublas''​ : Matrix-Matrix multiplication for CUDA library in double precision 
 +  * ''​xGEMM_SP_cublas''​ : Matrix-Matrix multiplication for CUDA library in simple precision
  
-Improvment of statistics+=== Exercice #19 : select parallelized program and explore salability ===
  
-Scalability law+  * launch one of the upper code with ''​PR''​ from ''​1''​ to the 2 times the number of CPUs 
 +  * draw the scalability curve 
 +  * estimates the parameters with Amdahl Law and Mylq Law
  
-Amdahl Law+==== Your prefered software ====
  
-Mylq Law+=== Exercice #20 : select parallelized program and explore salability ===
  
 +  * launch your MPI code with ''​PR''​ from ''​1''​ to the 2 times the number of CPUs
 +  * draw the scalability curve
 +  * estimates the parameters with Amdahl Law and Mylq Law
  
 + --- //​[[emmanuel.quemener@ens-lyon.fr|Emmanuel Quemener]] 2017/06/30 14:26//
formation/astrosim2017para4dummies.1498803643.txt.gz · Dernière modification: 2017/06/30 08:20 par equemene