Différences

Ci-dessous, les différences entre deux révisions de la page.

Lien vers cette vue comparative

Les deux révisions précédentes Révision précédente
Prochaine révision
Révision précédente
formation:astrosim2017para4dummies [2017/06/30 06:52]
equemene
formation:astrosim2017para4dummies [2017/07/07 09:25] (Version actuelle)
equemene [5 W/2H : Why ? What ? Where ? When ? Who ? How much ? How ?]
Ligne 13: Ligne 13:
   * **Where ?** On workstations,​ cluster nodes, laptop (well configured),​ inside terminals   * **Where ?** On workstations,​ cluster nodes, laptop (well configured),​ inside terminals
   * **Who ?** For people who want to open the hood    * **Who ?** For people who want to open the hood 
-  * **How ?** Applying some simples ​commands (essentially shell ones)+  * **How ?** Applying some simple ​commands (essentially shell ones)
  
 ===== Session Goal ===== ===== Session Goal =====
Ligne 134: Ligne 134:
 </​code>​ </​code>​
  
-=== Question ​#1: get this informations on your host with ''​cat /​proc/​cpuinfo''​ and compare to one above ===+=== Exercice ​#1: get this informations on your host with ''​cat /​proc/​cpuinfo''​ and compare to one above ===
  
   * How much lines of informations ?   * How much lines of informations ?
  
-=== Question ​#2 : get the informations on your host with ''​lscpu''​ command ===+=== Exercice ​#2 : get the informations on your host with ''​lscpu''​ command ===
  
   * What new informations appear on the output ?    * What new informations appear on the output ? 
Ligne 150: Ligne 150:
 {{ :​formation:​lstopo_035.png?​400 |hwloc-ls}} {{ :​formation:​lstopo_035.png?​400 |hwloc-ls}}
  
-=== Question ​#3 : get a graphical representation of hardware with ''​hwloc-ls''​ command ===+=== Exercice ​#3 : get a graphical representation of hardware with ''​hwloc-ls''​ command ===
  
   * Locate and identify the elements provided with ''​lscpu''​ command   * Locate and identify the elements provided with ''​lscpu''​ command
Ligne 178: Ligne 178:
 </​code> ​ </​code> ​
  
-=== Question ​#4 : list the PCI peripherals with ''​lspci''​ command ===+=== Exercice ​#4 : list the PCI peripherals with ''​lspci''​ command ===
  
   * How many devices do you get ?   * How many devices do you get ?
Ligne 190: Ligne 190:
 As when your drive a car, it's useful to get informations about running system during process. The commands ''​top''​ and ''​htop''​ As when your drive a car, it's useful to get informations about running system during process. The commands ''​top''​ and ''​htop''​
  
-=== Question ​#5: open ''​htop''​ and ''​top''​ in two terminals ===+=== Exercice ​#5: open ''​htop''​ and ''​top''​ in two terminals ===
  
   * What do you see first ?   * What do you see first ?
Ligne 283: Ligne 283:
 </​code>​ </​code>​
  
-=== Question ​#6 : exploration of ''/​usr/​bin/​time''​ on several command Unix commands ​''​ls,​ cp,  ​===+=== Exercice ​#6 : exploration of ''/​usr/​bin/​time''​ on several command Unix commands ​or your small programs ​===
  
  
Ligne 343: Ligne 343:
 </​code>​ </​code>​
  
-=== Question ​#7 : practice ''​Rmmmms-$USER.r''​ and investigate variability ===+=== Exercice ​#7 : practice ''​Rmmmms-$USER.r''​ and investigate variability ===
  
   * Launch previous command to 10000, 1000, 100 launchs with respectly sizes of 10, 100, 1000   * Launch previous command to 10000, 1000, 100 launchs with respectly sizes of 10, 100, 1000
Ligne 350: Ligne 350:
 This will be very useful to extract and provides statistics of times. This will be very useful to extract and provides statistics of times.
  
-===== First steps in parallelism ​=====+===== An illustrative example: Pi Dart Dash =====
  
-==== An illustrative example: Pi Dart Dash ==== +==== Principle, inputs & outputs ​====
- +
-=== Principle, inputs & outputs ===+
  
 The most common example of Monte Carlo program: estimate Pi number by the ratio between the number of points located in the quarter of a circle where random points are uniformly distributed. It needs: The most common example of Monte Carlo program: estimate Pi number by the ratio between the number of points located in the quarter of a circle where random points are uniformly distributed. It needs:
Ligne 365: Ligne 363:
   * Output: an integer as number of points inside the quarter of circle   * Output: an integer as number of points inside the quarter of circle
   * Output (bis): an estimation of Pi number (very inefficient method but the result is well known, so easy checked).   * Output (bis): an estimation of Pi number (very inefficient method but the result is well known, so easy checked).
 +  * Output (ter): the total amount of iterations (just to remind)
  
 The following implementation is as ''​bash''​ shell script one. The ''​RANDOM''​ command provides a random number between 0 and 32767. So the frontier is located on ''​32767*32767''​. ​ The following implementation is as ''​bash''​ shell script one. The ''​RANDOM''​ command provides a random number between 0 and 32767. So the frontier is located on ''​32767*32767''​. ​
Ligne 406: Ligne 405:
 A program name ''​PiMC-$USER.sh''​ located in ''/​tmp''​ where ''​$USER''​ is your login is created and ready to use. A program name ''​PiMC-$USER.sh''​ located in ''/​tmp''​ where ''​$USER''​ is your login is created and ready to use.
  
-== Question ​#?: launch ''​PiMC''​ program with several number of iterations: from 100 to 10000000 ​==+=== Exercice ​#8: launch ''​PiMC''​ program with several number of iterations: from 100 to 1000000 ===
  
-  *  +  * What is the typical precision of the result ?
  
-== Question ​#?: launch ''​PiMC''​ program prefixed by ''/​usr/​bin/​time''​ with several number of iterations: 100 to 1000000 ==+=== Exercice ​#9: launch ''​PiMC''​ program prefixed by ''/​usr/​bin/​time''​ with several number of iterations: 100 to 1000000 ==
 + 
 +  * Grep the ''​Elapsed''​ and ''​Iterations''​ and estimate manually the **ITOPS** (ITerative Operations Per Second) for this program implementation 
 +  * Improve the test to estimate the ITOPS //on the fly//: apply to different amount of iterations and several time 
 + 
 +One Solution:<​code>​ 
 +echo $(/​usr/​bin/​time /​tmp/​PiMC-$USER.sh 100000 2>&1 | egrep '​(Elapsed|Iterations)'​ | awk '{ print $NF }' | tr '​\n'​ '/'​)1 | bc -l 
 +</​code>​ 
 + 
 +For 100000 iterations, 10 times: 
 +<​code>​ 
 +31250.00000000000000000000 
 +31645.56962025316455696202 
 +28248.58757062146892655367 
 +30864.19753086419753086419 
 +31847.13375796178343949044 
 +32362.45954692556634304207 
 +32467.53246753246753246753 
 +31545.74132492113564668769 
 +32573.28990228013029315960 
 +32362.45954692556634304207 
 +</​code>​ 
 + 
 +Example of code for previous results:<​code>​ 
 +for i in $(seq 10 ) ; do echo $(/​usr/​bin/​time /​tmp/​PiMC-$USER.sh 100000 2>&1 | egrep '​(Elapsed|Iterations)'​ | awk '{ print $NF }' | tr '​\n'​ '/'​)1 | bc -l ; done</​code>​ 
 + 
 +From 1000 to 1000000, 1 time: 
 +<​code>​ 
 +1000 20000.00000000000000000000 
 +10000 26315.78947368421052631578 
 +100000 32154.34083601286173633440 
 +1000000 31685.67807351077313054499 
 +</​code>​
  
-=== Split the execution in equal parts ===+Example of code for previous results:<​code>​ 
 +for POWER in $(seq 3 1 6); do ITERATIONS=$((10**$POWER)) ; echo -ne $ITERATIONS'​\t'​ ; echo $(/​usr/​bin/​time /​tmp/​PiMC-$USER.sh $ITERATIONS 2>&1 | egrep '​(Elapsed|Iterations)'​ | awk '{ print $NF }' | tr '​\n'​ '/'​)1 | bc -l ; done</​code>​ 
 +==== Split the execution in equal parts ====
  
 The following command line divides the job to do (10000000 iterations) into ''​PR''​ equal jobs. The following command line divides the job to do (10000000 iterations) into ''​PR''​ equal jobs.
Ligne 436: Ligne 469:
  
 On the previous launch, User time represents 99.6% of Elapsed time. Internal system operations only 0.4%. On the previous launch, User time represents 99.6% of Elapsed time. Internal system operations only 0.4%.
 +
 +=== Exercice #10 : identification of the cost of splitting process ===
 +
 +  * Explore the values of ''​User'',​ ''​System''​ and ''​Elapsed''​ times for different values of iterations
 +  * Estimate the ratio between ''​User time''​ and ''​Elapsed time''​ for the results
 +  * Estimate the ratio between ''​System time''​ and ''​Elapsed time''​ for the results
 +  * What could you conclude ?
  
 Replace the ''​PR''​ set as ''​1''​ by the detected number of CPU with ''​lspcu''​ command). Replace the ''​PR''​ set as ''​1''​ by the detected number of CPU with ''​lspcu''​ command).
Ligne 550: Ligne 590:
 In this example, we see that the User time represents 98.52% of the Elapsed time. The total Elapsed time is greater up to 10% to unsplitted one. So, splitting has a cost. The system time represents 0.4% of Elapsed time. In this example, we see that the User time represents 98.52% of the Elapsed time. The total Elapsed time is greater up to 10% to unsplitted one. So, splitting has a cost. The system time represents 0.4% of Elapsed time.
  
-Replace ​the end of the program to extract the total //Inside// number of iterations+=== Exercice #11 : identification of the cost of splitting process === 
 + 
 +  * Explore the values of ''​User'',​ ''​System''​ and ''​Elapsed''​ times for different values of iterations 
 +  * Estimate the ratio between ''​User time''​ and ''​Elapsed time''​ for the results 
 +  * Estimate the ratio between ''​System time''​ and ''​Elapsed time''​ for the results 
 +  * What could you conclude ? 
 + 
 +=== Exercice #12 : merging results & improve metrology === 
 + 
 +  * Append ​the program to extract the total amount of //Inside// number of iterations 
 +  * Set timers inside command lines to estimate the total Elapsed time 
 + 
 +Solution: the timer used are based on ''​date''​ command
 <​code>​ <​code>​
 ITERATIONS=1000000 ITERATIONS=1000000
 +START=$(date '​+%s.%N'​)
 PR=$(lscpu | grep '​^CPU(s):'​ | awk '{ print $NF }') PR=$(lscpu | grep '​^CPU(s):'​ | awk '{ print $NF }')
 EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/​$PR)) || echo $(($ITERATIONS/​$PR+1))) EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/​$PR)) || echo $(($ITERATIONS/​$PR+1)))
 seq $PR | /​usr/​bin/​time xargs -I '​{}'​ /​tmp/​PiMC-$USER.sh $EACHJOB '​{}'​ 2>&1 | grep ^Inside | awk '{ sum+=$2 } END { printf "​Insides %i", sum }' ; echo seq $PR | /​usr/​bin/​time xargs -I '​{}'​ /​tmp/​PiMC-$USER.sh $EACHJOB '​{}'​ 2>&1 | grep ^Inside | awk '{ sum+=$2 } END { printf "​Insides %i", sum }' ; echo
 +STOP=$(date '​+%s.%N'​)
 +echo Total Elapsed time: $(echo $STOP-$START | bc -l) 
 </​code>​ </​code>​
 +
 +==== After splitting, finally the parallelization ====
  
 In this illustrative case, each job is independant to others. They can be distributed to all the computing resources available. ''​xargs''​ command line builder do it for you with ''​-P <​ConcurrentProcess>''​. In this illustrative case, each job is independant to others. They can be distributed to all the computing resources available. ''​xargs''​ command line builder do it for you with ''​-P <​ConcurrentProcess>''​.
Ligne 674: Ligne 731:
 In conclusion, splitting a huge job into small jobs has a Operating System cost. But distribute the jobs using system can very efficient to reduce Elapsed time. In conclusion, splitting a huge job into small jobs has a Operating System cost. But distribute the jobs using system can very efficient to reduce Elapsed time.
  
-We can improve statistics by launching 10x the previous program. We storage the different ​'time' ​estimators inside a logfile named as ''​Ouput_PiMC-$USER_YYYYmmddHHMM.log''​+=== Exercice #13 : launch with ''​-P'' ​set with the number of CPU detected ===
  
 +  * Examine the ''​Elapsed time'':​ decrease or not ?
 +  * Examine the ''​User time'':​ increase or not ?
 +  * Examine the ''​System time'':​ increase or not ?
 +
 +=== Exercice #14 : append the program to improve statistics ===
 +
 +  * Add iterator to redo the program 10 times
 +  * Store the ''​time''​ estimators inside an output file defined as : ''/​tmp/​PiMC-$USER_YYYYmmddHHMM.log''​
 +  * Parse the output file and extract statistics on 3 times estimators.
 +  * Estimate the speedup between ''​PR=1''​ and ''​PR=<​NumberOfCPU>''​
 +  * Multiply by 10 the number of iterations and estimate the speedup
 +
 +Solution:
 <​code>​ <​code>​
 ITERATIONS=1000000 ITERATIONS=1000000
Ligne 688: Ligne 758:
 </​code>​ </​code>​
  
 +Example of output file:
 <​code>​ <​code>​
 TIME User time (seconds): 59.81 TIME User time (seconds): 59.81
Ligne 721: Ligne 792:
 </​code>​ </​code>​
  
 +Examples of statistics on estimators:
 With //magic// ''​Rmmmms-$USER.r''​ command, we can extract statistics on different times  With //magic// ''​Rmmmms-$USER.r''​ command, we can extract statistics on different times 
-  * for //Elapsed time// : ''​cat /tmp/PiMC-jmylq_201706291231.log | grep Elapsed | awk '{ print $NF }' | /​tmp/​Rmmmms-$USER.r'':<​code>​1.96 2.02 1.985 1.987 0.01888562 0.009514167</​code>​ +  * for //Elapsed time// : ''​cat /tmp/PiMC-$USER_201706291231.log | grep Elapsed | awk '{ print $NF }' | /​tmp/​Rmmmms-$USER.r'':<​code>​1.96 2.02 1.985 1.987 0.01888562 0.009514167</​code>​ 
-  * for //System time// : ''​cat /tmp/PiMC-jmylq_201706291231.log | grep System | awk '{ print $NF }' | /​tmp/​Rmmmms-$USER.r'':<​code>​0.09 0.22 0.14 0.139 0.03665151 0.2617965</​code>​ +  * for //System time// : ''​cat /tmp/PiMC-$USER_201706291231.log | grep System | awk '{ print $NF }' | /​tmp/​Rmmmms-$USER.r'':<​code>​0.09 0.22 0.14 0.139 0.03665151 0.2617965</​code>​ 
-  * for //User time// : ''​cat /tmp/PiMC-jmylq_201706291231.log | grep User | awk '{ print $NF }' | /​tmp/​Rmmmms-$USER.r'':<​code>​59.12 59.81 59.375 59.436 0.2179297 0.003670394</​code>​+  * for //User time// : ''​cat /tmp/PiMC-$USER_201706291231.log | grep User | awk '{ print $NF }' | /​tmp/​Rmmmms-$USER.r'':<​code>​59.12 59.81 59.375 59.436 0.2179297 0.003670394</​code>​
  
 The previous results show that the variability,​ in this cas, in  The previous results show that the variability,​ in this cas, in 
Ligne 740: Ligne 812:
 </​code>​ </​code>​
  
-On the first half of cores: 0 to 15<​code>​ +==== Select ​the execution ​cores ==== 
-ITERATIONS=10000000 ; PR=$(($(lscpu | grep '^CPU(s):' ​| awk '{ print $NF }')/2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/$PR)) || echo $(($ITERATIONS/$PR+1))) ; seq $PR | /usr/bin/time hwloc-bind -p pu:0-15 xargs -I '​{}'​ -P $PR /​tmp/​PiMC-$USER.sh $EACHJOB '​{}'​ 2>&| grep -v timed | egrep '​(Pi|Inside|Iterations|time)'​+ 
 +It'​s ​possible with the ''​hwloc-bind'' command to select the cores on which you would like to execute your program. You just have to specify the physical units with the format ​//from//''​-''​//to//. For example, if you want to execute the parallelized application MyParallelApplication on a machine with 8 cores (defined from ''​0''​ to ''​7''​) only on the two first:<​code>​ 
 +hwloc-bind -p pu:​0-1 ​./​MyParallelApplication
 </​code>​ </​code>​
  
-On the second half of cores16 to 31<​code>​ +If you want to select only one atomic core, the last one, for example:<​code>​ 
-ITERATIONS=10000000 ; PR=$(($(lscpu | grep '​^CPU(s):'​ | awk '{ print $NF }'​)*2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/​$PR)) || echo $(($ITERATIONS/​$PR+1))) ; seq $PR | /​usr/​bin/​time ​hwloc-bind -p pu:16-31 xargs -I '​{}'​ -P $PR /tmp/​PiMC-$USER.sh $EACHJOB '​{}'​ 2>&1 | grep -v timed | egrep '​(Pi|Inside|Iterations|time)'​+hwloc-bind -p pu:7-7 ./MyParallelApplication
 </​code>​ </​code>​
  
-Why to much user time+If you want to select several non adjacent cores, the first and the last ones, for example:<​code>​ 
 +hwloc-bind -p pu:0-0 pu:7-7 ./​MyParallelApplication 
 +</​code>​ 
 + 
 +<note important>​You can control the selection by watching in another terminal the ''​htop''​ activity of cores</​note>​ 
 + 
 +=== Exercice #15 : launch the previous program on a slice of machine === 
 + 
 +  * Identify and launch the program on only the first core 
 +  * Identify and launch the program on the first half of cores 
 +  * Identify and launch the program on the second half of cores 
 +  * Identify and launch on two first cores 
 +  * Identify and launch on first on the first half and first on the second half of cores 
 +  * Why is there a so great difference between elapsed time  
 + 
 +Watch inside terminal with ''​htop'' ​to check the right distribution of tasks. 
 + 
 +Solutions for a 32 cores workstation:​ 
 +  * On the first core: 0 <​code>​ 
 +ITERATIONS=10000000 ; PR=$(($(lscpu | grep '​^CPU(s):'​ | awk '{ print $NF }'​)/​2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/​$PR)) || echo $(($ITERATIONS/​$PR+1))) ; seq $PR | /​usr/​bin/​time hwloc-bind -p pu:0-0 xargs -I '​{}'​ -P $PR /​tmp/​PiMC-$USER.sh $EACHJOB '​{}'​ 2>&1 | grep -v timed | egrep '​(Pi|Inside|Iterations|time)'​ 
 +</​code>​ 
 +  * On the first half of cores: 0 to 15<​code>​ 
 +ITERATIONS=10000000 ; PR=$(($(lscpu | grep '​^CPU(s):'​ | awk '{ print $NF }'​)/​2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/​$PR)) || echo $(($ITERATIONS/​$PR+1))) ; seq $PR | /​usr/​bin/​time hwloc-bind -p pu:0-15 xargs -I '​{}'​ -P $PR /​tmp/​PiMC-$USER.sh $EACHJOB '​{}'​ 2>&1 | grep -v timed | egrep '​(Pi|Inside|Iterations|time)'​ 
 +</​code>​ 
 +  * On the second half of cores: 16 to 31<​code>​ 
 +ITERATIONS=10000000 ; PR=$(($(lscpu | grep '​^CPU(s):'​ | awk '{ print $NF }'​)*2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/​$PR)) || echo $(($ITERATIONS/​$PR+1))) ; seq $PR | /​usr/​bin/​time hwloc-bind -p pu:16-31 xargs -I '​{}'​ -P $PR /​tmp/​PiMC-$USER.sh $EACHJOB '​{}'​ 2>&1 | grep -v timed | egrep '​(Pi|Inside|Iterations|time)'​ 
 +</​code>​ 
 +  * On the first of first half and first of second half of cores: 0 and 8<​code>​ 
 +ITERATIONS=10000000 ; PR=$(($(lscpu | grep '​^CPU(s):'​ | awk '{ print $NF }'​)*2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/​$PR)) || echo $(($ITERATIONS/​$PR+1))) ; seq $PR | /​usr/​bin/​time hwloc-bind -p pu:0-1 xargs -I '​{}'​ -P $PR /​tmp/​PiMC-$USER.sh $EACHJOB '​{}'​ 2>&1 | grep -v timed | egrep '​(Pi|Inside|Iterations|time)'​ 
 +</​code>​ 
 +  * On the first of first half and first of second half of cores: 0 and 8<​code>​ 
 +ITERATIONS=10000000 ; PR=$(($(lscpu | grep '​^CPU(s):'​ | awk '{ print $NF }'​)*2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/​$PR)) || echo $(($ITERATIONS/​$PR+1))) ; seq $PR | /​usr/​bin/​time hwloc-bind -p pu:0-0 pu:8-8 xargs -I '​{}'​ -P $PR /​tmp/​PiMC-$USER.sh $EACHJOB '​{}'​ 2>&1 | grep -v timed | egrep '​(Pi|Inside|Iterations|time)' 
 +</​code>​
  
-HT Effect ​why so much people desactivate...+=== Exercice #17 from exploration to laws estimation ===
  
-Scalability exploration ​from PR=1 to PR=2x CPU+  * explore with previous program ​from ''​PR=1'' ​to ''​PR=<2x CPU>'',​ 10x for each 
 +  * store the results in a file
  
 +Solution:
 <​code>​ <​code>​
 ITERATIONS=1000000 ; ITERATIONS=1000000 ;
Ligne 841: Ligne 949:
 </​code>​ </​code>​
  
-Examples of codes+=== Question #18 : plot & fit with Amdahl and Mylq laws === 
 + 
 +  * plot the curve with your favorite plotter the different values, focus on median one ! 
 +  * fit with an Amdahl law where ''​T=s+p/​N''​ where ''​N''​ is ''​PR''​ 
 +  * fit with a Mylq law where ''​T=s+c*N+p/​N''​  
 +  * what law match the best 
 + 
 +Examples of gnuplot bunch of commands to do the job. Adapt to your file and ''​PR''​... 
 +<​code>​ 
 +Ta(x)=T1*(1-Pa+Pa/​x) 
 +fit [x=1:16] Ta(x) '​PiMC_1_64.dat'​ using 1:4 via T1,Pa 
 +Tm(x)=Sm+Cm*x+Pm/​x 
 +fit [x=1:16] Tm(x) '​PiMC_1_64.dat'​ using 1:4 via Sm,Cm,Pm 
 +set xlabel '​Parallel Rate'​ 
 +set xrange [1:64] 
 +set ylabel "​Speedup Factor"​ 
 +set title "PiMC : parallel execution with Bash for distributed iterations"​ 
 +plot    '​PiMC_1_64.dat'​ using ($1):​(Tm(1)/​$4) title '​Mesures'​ with points,\ 
 + Tm(1)/Tm(x) title "Mylq Law" with lines,\ 
 +  Ta(1)/Ta(x) title "​Amdahl Law" with lines 
 +</​code>​
  
-xGEMM  +{{ :​formation:​pimc_1_64.png?600 |}} 
-NBody.py +==== Other sample codes (used for courses) ====
-PiXPU.py+
  
-Choose your prefered parallel code+In folder ''/​scratch/​AstroSim2017'',​ you will find the following executables:​ 
 +  * ''​PiXPU.py''​ : Pi Monte Carlo Dart Dash in PyOpenCL 
 +  * ''​NBody.py''​ : N-Body in PyOpenCL 
 +  * ''​xGEMM_DP_openblas''​ : Matrix-Matrix multiplication with multithreaded OpenBLAS library in double precision 
 +  * ''​xGEMM_SP_openblas''​ : Matrix-Matrix multiplication with multithreaded OpenBLAS library in simple precision 
 +  * ''​xGEMM_DP_clblas''​ : Matrix-Matrix multiplication for OpenCL library in double precision 
 +  * ''​xGEMM_SP_clblas''​ : Matrix-Matrix multiplication for OpenCL library in simple precision 
 +  * ''​xGEMM_DP_cublas''​ : Matrix-Matrix multiplication for CUDA library in double precision 
 +  * ''​xGEMM_SP_cublas''​ : Matrix-Matrix multiplication for CUDA library in simple precision
  
-Improvment of statistics+=== Exercice #19 : select parallelized program and explore salability ===
  
-Scalability law+  * launch one of the upper code with ''​PR''​ from ''​1''​ to the 2 times the number of CPUs 
 +  * draw the scalability curve 
 +  * estimates the parameters with Amdahl Law and Mylq Law
  
-Amdahl Law+==== Your prefered software ====
  
-Mylq Law+=== Exercice #20 : select parallelized program and explore salability ===
  
 +  * launch your MPI code with ''​PR''​ from ''​1''​ to the 2 times the number of CPUs
 +  * draw the scalability curve
 +  * estimates the parameters with Amdahl Law and Mylq Law
  
 + --- //​[[emmanuel.quemener@ens-lyon.fr|Emmanuel Quemener]] 2017/06/30 14:26//
formation/astrosim2017para4dummies.1498798373.txt.gz · Dernière modification: 2017/06/30 06:52 par equemene