formation:astrosim2017para4dummies

Différences

Ci-dessous, les différences entre deux révisions de la page.

--- formation:astrosim2017para4dummies [2017/06/29 17:11]
equemene
+++ formation:astrosim2017para4dummies [2017/07/07 09:25]
equemene [5 W/2H : Why ? What ? Where ? When ? Who ? How much ? How ?]
@@ Ligne 12: / Ligne 12: @@
   * **How much ?** Nothing, Blaise Pascal Center provides workstations & cluster nodes
   * **Where ?** On workstations, cluster nodes, laptop (well configured), inside terminals
-  * **Who ?** For people who want to open the front
+  * **Who ?** For people who want to open the hood
-  * **How ?** Applying some simples commands
+  * **How ?** Applying some simple commands (essentially shell ones)
 ===== Session Goal =====
@@ Ligne 27: / Ligne 27: @@
 === Prerequisite for hardware ===
+  * If using CBP resources, nothing... Just login...
+  * If NOT using CBP resources, a machine relatively recent with multi-cores CPU
 === Prerequisite for software ===
-  * Open graphical session on one workstation
+  * Open graphical session on one workstation, several terminals and your favorite browser
-  * Open four terminals
+  * If NOT using CBP resources, a GNU/Linux Operating System well configured
 === Prerequisite for humanware ===
   * An allergy to command line will severely restrict the range of this practical session.
+  * A practice of shell scripts would be a asset, but you will improve it in this session!
 ===== Investigate Hardware =====
@@ Ligne 47: / Ligne 49: @@
   * Input and Output Devices
-First property of hardware is limited resources.
+The first property of hardware is limited resources.
 In Posix systems, everything is file. So you can retreive informations (or set configurations) by classical file commands inside a terminal. For example ''cat /proc/cpuinfo'' returns information about processor.
@@ Ligne 132: / Ligne 134: @@
 </code>
-=== Question #1: get this informations on your host with ''cat /proc/cpuinfo'' and compare to one above ===
+=== Exercice #1: get this informations on your host with ''cat /proc/cpuinfo'' and compare to one above ===
   * How much lines of informations ?
-=== Question #2 : get the informations on your host with ''lscpu'' command ===
+=== Exercice #2 : get the informations on your host with ''lscpu'' command ===
   * What new informations appear on the output ?
   * How many CPUs ? Threads per core ? Cores per socket ? Sockets ?
   * How many cache levels ?
-  * How many "flags" ?
+  * How many "flags" ? What do they represent ?
 ==== Exploration ====
@@ Ligne 148: / Ligne 150: @@
 {{ :formation:lstopo_035.png?400 |hwloc-ls}}
-=== Question #3 : get a graphical representation of hardware with ''hwloc-ls'' command ===
+=== Exercice #3 : get a graphical representation of hardware with ''hwloc-ls'' command ===
   * Locate and identify the elements provided with ''lscpu'' command
@@ Ligne 176: / Ligne 178: @@
 </code>
-=== Question #4 : list the PCI peripherals with ''lspci'' command ===
+=== Exercice #4 : list the PCI peripherals with ''lspci'' command ===
   * How many devices do you get ?
   * Can you identify the devices listed with graphical representation ?
   * What keywords on graphical representation define the VGA device ?
 ==== Exploring dynamic system ====
@@ Ligne 189: / Ligne 189: @@
 As when your drive a car, it's useful to get informations about running system during process. The commands ''top'' and ''htop''
+=== Exercice #5: open ''htop'' and ''top'' in two terminals ===
+  * What do you see first ?
+  * How much memory have you ?
+  * How much swap ?
+  * How many tasks are launched ? How many threads ?
 ==== Tiny metrology with ''/usr/bin/time'' ====
 <note important>Be careful, there is a difference between ''time'' included as command in shells and ''time'' as standalone program. In order not to get difficulties, the program ''time'' has to be resquested by ''/usr/bin/time''!</note>
-=== Introduction & Redefinition of metrology arguments ===
 Difference between ''time'' build in command and ''time'' standalone program.
@@ Ligne 277: / Ligne 282: @@
 TIME Exit status: 0
 </code>
+=== Exercice #6 : exploration of ''/usr/bin/time'' on several command Unix commands or your small programs ===
 ==== Statistics on the fly ! Penstacle of statistics ====
@@ Ligne 303: / Ligne 312: @@
 </code>
-To evaluate the variability to MemCopy test memory on 10 launches with a size of 1GB, the command is:
+To evaluate the variability to MemCopy test memory in ''mbw'' tool on 10 launches with a size of 1GB, the command is:
 <code>
 mbw -a -t 0 -n 10 1000
 </code>
-Here is an example of output:
+This is an example of output:
 <code>
 Long uses 8 bytes. Allocating 2*131072000 elements = 2097152000 bytes of memory.
@@ Ligne 329: / Ligne 338: @@
 </code>
-Here is the output
+This is an example of output:
 <code>
 .783	5673.179	5624.503	5625.749	21.81671	0.003878869
 </code>
-This will be very useful to extract and provides statistics of times.
+=== Exercice #7 : practice ''Rmmmms-$USER.r'' and investigate variability ===
-===== First steps in parallelism =====
+  * Launch previous command to 10000, 1000, 100 launchs with respectly sizes of 10, 100, 1000
+  * Have a look on statistics estimators : what tipically variability do you reach ?
-==== Before exploration, check the instrumentation ! ====
+This will be very useful to extract and provides statistics of times.
-==== An illustrative example: Pi Dart Dash ====
+===== An illustrative example: Pi Dart Dash =====
-=== Principle, inputs & outputs ===
+==== Principle, inputs & outputs ====
 The most common example of Monte Carlo program: estimate Pi number by the ratio between the number of points located in the quarter of a circle where random points are uniformly distributed. It needs:
@@ Ligne 354: / Ligne 363: @@
   * Output: an integer as number of points inside the quarter of circle
   * Output (bis): an estimation of Pi number (very inefficient method but the result is well known, so easy checked).
+  * Output (ter): the total amount of iterations (just to remind)
 The following implementation is as ''bash'' shell script one. The ''RANDOM'' command provides a random number between 0 and 32767. So the frontier is located on ''32767*32767''.
@@ Ligne 395: / Ligne 405: @@
 A program name ''PiMC-$USER.sh'' located in ''/tmp'' where ''$USER'' is your login is created and ready to use.
-== Question #?: launch ''PiMC'' program with several number of iterations: from 100 to 10000000 ==
+=== Exercice #8: launch ''PiMC'' program with several number of iterations: from 100 to 1000000 ===
-  *
+  * What is the typical precision of the result ?
-== Question #?: launch ''PiMC'' program prefixed by ''/usr/bin/time'' with several number of iterations: 100 to 1000000 ==
+=== Exercice #9: launch ''PiMC'' program prefixed by ''/usr/bin/time'' with several number of iterations: 100 to 1000000 ===
-=== Split the execution in equal parts ===
+  * Grep the ''Elapsed'' and ''Iterations'' and estimate manually the **ITOPS** (ITerative Operations Per Second) for this program implementation
+  * Improve the test to estimate the ITOPS //on the fly//: apply to different amount of iterations and several time
+One Solution:<code>
+echo $(/usr/bin/time /tmp/PiMC-$USER.sh 100000 2>&1 | egrep '(Elapsed|Iterations)' | awk '{ print $NF }' | tr '\n' '/')1 | bc -l
+</code>
+For 100000 iterations, 10 times:
+<code>
+.00000000000000000000
+.56962025316455696202
+.58757062146892655367
+.19753086419753086419
+.13375796178343949044
+.45954692556634304207
+.53246753246753246753
+.74132492113564668769
+.28990228013029315960
+.45954692556634304207
+</code>
+Example of code for previous results:<code>
+for i in $(seq 10 ) ; do echo $(/usr/bin/time /tmp/PiMC-$USER.sh 100000 2>&1 | egrep '(Elapsed|Iterations)' | awk '{ print $NF }' | tr '\n' '/')1 | bc -l ; done</code>
+From 1000 to 1000000, 1 time:
+<code>
+	20000.00000000000000000000
+	26315.78947368421052631578
+	32154.34083601286173633440
+1000000	31685.67807351077313054499
+</code>
+Example of code for previous results:<code>
+for POWER in $(seq 3 1 6); do ITERATIONS=$((10**$POWER)) ; echo -ne $ITERATIONS'\t' ; echo $(/usr/bin/time /tmp/PiMC-$USER.sh $ITERATIONS 2>&1 | egrep '(Elapsed|Iterations)' | awk '{ print $NF }' | tr '\n' '/')1 | bc -l ; done</code>
+==== Split the execution in equal parts ====
 The following command line divides the job to do (10000000 iterations) into ''PR'' equal jobs.
@@ Ligne 425: / Ligne 469: @@
 On the previous launch, User time represents 99.6% of Elapsed time. Internal system operations only 0.4%.
+=== Exercice #10 : identification of the cost of splitting process ===
+  * Explore the values of ''User'', ''System'' and ''Elapsed'' times for different values of iterations
+  * Estimate the ratio between ''User time'' and ''Elapsed time'' for the results
+  * Estimate the ratio between ''System time'' and ''Elapsed time'' for the results
+  * What could you conclude ?
 Replace the ''PR'' set as ''1'' by the detected number of CPU with ''lspcu'' command).
@@ Ligne 539: / Ligne 590: @@
 In this example, we see that the User time represents 98.52% of the Elapsed time. The total Elapsed time is greater up to 10% to unsplitted one. So, splitting has a cost. The system time represents 0.4% of Elapsed time.
-Replace the end of the program to extract the total //Inside// number of iterations.
+=== Exercice #11 : identification of the cost of splitting process ===
+  * Explore the values of ''User'', ''System'' and ''Elapsed'' times for different values of iterations
+  * Estimate the ratio between ''User time'' and ''Elapsed time'' for the results
+  * Estimate the ratio between ''System time'' and ''Elapsed time'' for the results
+  * What could you conclude ?
+=== Exercice #12 : merging results & improve metrology ===
+  * Append the program to extract the total amount of //Inside// number of iterations
+  * Set timers inside command lines to estimate the total Elapsed time
+Solution: the timer used are based on ''date'' command
 <code>
 ITERATIONS=1000000
+START=$(date '+%s.%N')
 PR=$(lscpu | grep '^CPU(s):' | awk '{ print $NF }')
 EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/$PR)) || echo $(($ITERATIONS/$PR+1)))
 seq $PR | /usr/bin/time xargs -I '{}' /tmp/PiMC-$USER.sh $EACHJOB '{}' 2>&1 | grep ^Inside | awk '{ sum+=$2 } END { printf "Insides %i", sum }' ; echo
+STOP=$(date '+%s.%N')
+echo Total Elapsed time: $(echo $STOP-$START | bc -l)
 </code>
+==== After splitting, finally the parallelization ====
 In this illustrative case, each job is independant to others. They can be distributed to all the computing resources available. ''xargs'' command line builder do it for you with ''-P <ConcurrentProcess>''.
@@ Ligne 663: / Ligne 731: @@
 In conclusion, splitting a huge job into small jobs has a Operating System cost. But distribute the jobs using system can very efficient to reduce Elapsed time.
-We can improve statistics by launching 10x the previous program. We storage the different 'time' estimators inside a logfile named as ''Ouput_PiMC-$USER_YYYYmmddHHMM.log''
+=== Exercice #13 : launch with ''-P'' set with the number of CPU detected ===
+  * Examine the ''Elapsed time'': decrease or not ?
+  * Examine the ''User time'': increase or not ?
+  * Examine the ''System time'': increase or not ?
+=== Exercice #14 : append the program to improve statistics ===
+  * Add iterator to redo the program 10 times
+  * Store the ''time'' estimators inside an output file defined as : ''/tmp/PiMC-$USER_YYYYmmddHHMM.log''
+  * Parse the output file and extract statistics on 3 times estimators.
+  * Estimate the speedup between ''PR=1'' and ''PR=<NumberOfCPU>''
+  * Multiply by 10 the number of iterations and estimate the speedup
+Solution:
 <code>
 ITERATIONS=1000000
@@ Ligne 677: / Ligne 758: @@
 </code>
+Example of output file:
 <code>
 TIME User time (seconds): 59.81
@@ Ligne 710: / Ligne 792: @@
 </code>
+Examples of statistics on estimators:
 With //magic// ''Rmmmms-$USER.r'' command, we can extract statistics on different times
-  * for //Elapsed time// : ''cat /tmp/PiMC-jmylq_201706291231.log | grep Elapsed | awk '{ print $NF }' | /tmp/Rmmmms-$USER.r'':<code>1.96	2.02	1.985	1.987	0.01888562	0.009514167</code>
+  * for //Elapsed time// : ''cat /tmp/PiMC-$USER_201706291231.log | grep Elapsed | awk '{ print $NF }' | /tmp/Rmmmms-$USER.r'':<code>1.96	2.02	1.985	1.987	0.01888562	0.009514167</code>
-  * for //System time// : ''cat /tmp/PiMC-jmylq_201706291231.log | grep System | awk '{ print $NF }' | /tmp/Rmmmms-$USER.r'':<code>0.09	0.22	0.14	0.139	0.03665151	0.2617965</code>
+  * for //System time// : ''cat /tmp/PiMC-$USER_201706291231.log | grep System | awk '{ print $NF }' | /tmp/Rmmmms-$USER.r'':<code>0.09	0.22	0.14	0.139	0.03665151	0.2617965</code>
-  * for //User time// : ''cat /tmp/PiMC-jmylq_201706291231.log | grep User | awk '{ print $NF }' | /tmp/Rmmmms-$USER.r'':<code>59.12	59.81	59.375	59.436	0.2179297	0.003670394</code>
+  * for //User time// : ''cat /tmp/PiMC-$USER_201706291231.log | grep User | awk '{ print $NF }' | /tmp/Rmmmms-$USER.r'':<code>59.12	59.81	59.375	59.436	0.2179297	0.003670394</code>
 The previous results show that the variability, in this cas, in
@@ Ligne 729: / Ligne 812: @@
 </code>
-On the first half of cores: 0 to 15<code>
+==== Select the execution cores ====
-ITERATIONS=10000000 ; PR=$(($(lscpu | grep '^CPU(s):' | awk '{ print $NF }')/2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/$PR)) || echo $(($ITERATIONS/$PR+1))) ; seq $PR | /usr/bin/time hwloc-bind -p pu:0-15 xargs -I '{}' -P $PR /tmp/PiMC-$USER.sh $EACHJOB '{}' 2>&1 | grep -v timed | egrep '(Pi|Inside|Iterations|time)'
+It's possible with the ''hwloc-bind'' command to select the cores on which you would like to execute your program. You just have to specify the physical units with the format //from//''-''//to//. For example, if you want to execute the parallelized application MyParallelApplication on a machine with 8 cores (defined from ''0'' to ''7'') only on the two first:<code>
+hwloc-bind -p pu:0-1 ./MyParallelApplication
 </code>
+If you want to select only one atomic core, the last one, for example:<code>
+hwloc-bind -p pu:7-7 ./MyParallelApplication
+</code>
+If you want to select several non adjacent cores, the first and the last ones, for example:<code>
+hwloc-bind -p pu:0-0 pu:7-7 ./MyParallelApplication
+</code>
-On the second half of cores: 16 to 31<code>
+<note important>You can control the selection by watching in another terminal the ''htop'' activity of cores</note>
+=== Exercice #15 : launch the previous program on a slice of machine ===
+  * Identify and launch the program on only the first core
+  * Identify and launch the program on the first half of cores
+  * Identify and launch the program on the second half of cores
+  * Identify and launch on two first cores
+  * Identify and launch on first on the first half and first on the second half of cores
+  * Why is there a so great difference between elapsed time
+Watch inside terminal with ''htop'' to check the right distribution of tasks.
+Solutions for a 32 cores workstation:
+  * On the first core: 0 <code>
+ITERATIONS=10000000 ; PR=$(($(lscpu | grep '^CPU(s):' | awk '{ print $NF }')/2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/$PR)) || echo $(($ITERATIONS/$PR+1))) ; seq $PR | /usr/bin/time hwloc-bind -p pu:0-0 xargs -I '{}' -P $PR /tmp/PiMC-$USER.sh $EACHJOB '{}' 2>&1 | grep -v timed | egrep '(Pi|Inside|Iterations|time)'
+</code>
+  * On the first half of cores: 0 to 15<code>
+ITERATIONS=10000000 ; PR=$(($(lscpu | grep '^CPU(s):' | awk '{ print $NF }')/2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/$PR)) || echo $(($ITERATIONS/$PR+1))) ; seq $PR | /usr/bin/time hwloc-bind -p pu:0-15 xargs -I '{}' -P $PR /tmp/PiMC-$USER.sh $EACHJOB '{}' 2>&1 | grep -v timed | egrep '(Pi|Inside|Iterations|time)'
+</code>
+  * On the second half of cores: 16 to 31<code>
 ITERATIONS=10000000 ; PR=$(($(lscpu | grep '^CPU(s):' | awk '{ print $NF }')*2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/$PR)) || echo $(($ITERATIONS/$PR+1))) ; seq $PR | /usr/bin/time hwloc-bind -p pu:16-31 xargs -I '{}' -P $PR /tmp/PiMC-$USER.sh $EACHJOB '{}' 2>&1 | grep -v timed | egrep '(Pi|Inside|Iterations|time)'
+</code>
+  * On the first of first half and first of second half of cores: 0 and 8<code>
+ITERATIONS=10000000 ; PR=$(($(lscpu | grep '^CPU(s):' | awk '{ print $NF }')*2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/$PR)) || echo $(($ITERATIONS/$PR+1))) ; seq $PR | /usr/bin/time hwloc-bind -p pu:0-1 xargs -I '{}' -P $PR /tmp/PiMC-$USER.sh $EACHJOB '{}' 2>&1 | grep -v timed | egrep '(Pi|Inside|Iterations|time)'
+</code>
+  * On the first of first half and first of second half of cores: 0 and 8<code>
+ITERATIONS=10000000 ; PR=$(($(lscpu | grep '^CPU(s):' | awk '{ print $NF }')*2)) ; EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/$PR)) || echo $(($ITERATIONS/$PR+1))) ; seq $PR | /usr/bin/time hwloc-bind -p pu:0-0 pu:8-8 xargs -I '{}' -P $PR /tmp/PiMC-$USER.sh $EACHJOB '{}' 2>&1 | grep -v timed | egrep '(Pi|Inside|Iterations|time)'
 </code>
-Why to much user time
+=== Exercice #17 : from exploration to laws estimation ===
-HT Effect : why so much people desactivate...
+  * explore with previous program from ''PR=1'' to ''PR=<2x CPU>'', 10x for each
+  * store the results in a file
-Examples of codes
+Solution:
+<code>
+ITERATIONS=1000000 ;
+REDO=10 ;
+PR_START=1 ;
+PR_STOP=$(($(lscpu | grep '^CPU(s):' | awk '{ print $NF }')*2)) ;
+OUTPUT=/tmp/$(basename /tmp/PiMC-$USER.sh .sh)_${PR_START}_${PR_STOP}_$(date "+%Y%m%d%H%M").dat
+seq $PR_START 1 $PR_STOP | while read PR ;
+do
+   echo -ne "$PR\t" ;
+   EACHJOB=$([ $(($ITERATIONS % $PR)) == 0 ] && echo $(($ITERATIONS/$PR)) || echo $(($ITERATIONS/$PR+1))) ;
+   seq $REDO | while read STEP ;
+   do
+      seq $PR | /usr/bin/time xargs -I '{}' -P $PR /tmp/PiMC-$USER.sh $EACHJOB '{}' 2>&1 | grep Elapsed | awk '{ print $NF }'
+   done | /tmp/Rmmmms-$USER.r
+done > $OUTPUT
+echo Results in $OUTPUT file
+</code>
+As an example, a 32HT cores workstation, we got:
+<code>
+# PR	MIN	MAX	AVG	MED	STDEV	Variability
+	29.94	35.16	30.56	30.99	1.54438	0.05053601
+	15.09	16.73	15.445	15.531	0.4647449	0.03009031
+	10.3	12.02	10.555	10.795	0.6131567	0.05809158
+	7.78	8.21	7.97	7.975	0.1269514	0.01592866
+	6.31	6.53	6.435	6.416	0.07366591	0.01144769
+	5.27	5.57	5.41	5.415	0.09778093	0.01807411
+	4.61	5.67	4.74	4.901	0.3989277	0.08416197
+	4.03	4.35	4.115	4.146	0.09800227	0.02381586
+	3.66	3.92	3.71	3.718	0.07420692	0.02000186
+	3.32	4.29	3.36	3.453	0.295524	0.08795358
+	3.01	4.45	3.08	3.229	0.4330114	0.1405881
+	2.77	4.29	2.86	3.019	0.4609519	0.161172
+	2.61	2.89	2.68	2.707	0.08602971	0.03210064
+	2.51	4.03	2.615	2.842	0.4982369	0.1905304
+	2.31	3.42	2.41	2.565	0.3422231	0.1420013
+	2.31	3.03	2.66	2.675	0.2382459	0.08956613
+	2.42	3.11	2.7	2.722	0.2395737	0.088731
+	2.42	2.8	2.67	2.627	0.1477272	0.05532855
+	2.52	2.72	2.605	2.615	0.06114645	0.02347273
+	2.43	2.91	2.54	2.579	0.136337	0.05367598
+	2.37	2.91	2.49	2.509	0.1540166	0.06185405
+	2.28	2.73	2.37	2.407	0.1271963	0.05366931
+	2.3	2.54	2.35	2.37	0.06879922	0.02927627
+	2.25	2.37	2.285	2.287	0.03368151	0.01474027
+	2.19	2.37	2.225	2.246	0.06022181	0.02706598
+	2.1	2.32	2.18	2.191	0.05606544	0.02571809
+	2.14	2.27	2.205	2.198	0.04516636	0.02048361
+	2.07	2.21	2.14	2.134	0.04273952	0.01997174
+	2.02	2.11	2.07	2.065	0.02758824	0.01332765
+	2	2.13	2.035	2.036	0.03806427	0.0187048
+	1.98	2.07	1.99	2.002	0.02820559	0.01417367
+	1.97	2.02	1.99	1.993	0.01766981	0.008879302
+	2.05	2.25	2.12	2.129	0.06402257	0.03019932
+	2.08	2.23	2.15	2.155	0.0457651	0.02128609
+	2.08	2.25	2.16	2.156	0.05853774	0.0271008
+	2.02	2.21	2.13	2.129	0.05782156	0.02714627
+	2.08	2.2	2.15	2.147	0.03560587	0.01656087
+	2.01	2.19	2.125	2.119	0.05384133	0.0253371
+	2.05	2.2	2.105	2.111	0.05108816	0.02426991
+	2.06	2.2	2.11	2.124	0.04526465	0.02145244
+	2.07	2.18	2.09	2.102	0.03425395	0.01638945
+	2.04	2.13	2.095	2.092	0.0265832	0.01268888
+	2.03	2.12	2.08	2.076	0.03025815	0.01454719
+	2.04	2.14	2.085	2.086	0.03204164	0.01536769
+	2.02	2.13	2.08	2.082	0.03392803	0.01631155
+	2.05	2.12	2.075	2.081	0.0218327	0.01052178
+	1.98	2.15	2.08	2.073	0.05250397	0.02524229
+	1.99	2.14	2.085	2.081	0.04557046	0.02185633
+	2.04	2.18	2.085	2.087	0.04321779	0.02072796
+	2.06	2.17	2.12	2.116	0.03657564	0.01725266
+	2.02	2.16	2.09	2.086	0.03864367	0.01848979
+	2.03	2.13	2.08	2.075	0.02915476	0.01401671
+	2.03	2.14	2.095	2.093	0.03465705	0.01654274
+	2	2.11	2.075	2.069	0.03212822	0.01548348
+	2.02	2.15	2.095	2.085	0.04062019	0.01938911
+	2.05	2.11	2.09	2.081	0.02078995	0.009947347
+	2.03	2.09	2.065	2.065	0.01840894	0.008914739
+	2.06	2.11	2.07	2.082	0.02250926	0.01087404
+	2.02	2.11	2.07	2.067	0.02451757	0.01184424
+	2.02	2.1	2.055	2.057	0.02406011	0.01170808
+	2.03	2.15	2.065	2.07	0.03333333	0.01614205
+	2.01	2.13	2.06	2.059	0.03842742	0.01865409
+	2.01	2.09	2.07	2.06	0.03018462	0.01458194
+	2.02	2.11	2.075	2.077	0.02945807	0.01419666
+</code>
+=== Question #18 : plot & fit with Amdahl and Mylq laws ===
+  * plot the curve with your favorite plotter the different values, focus on median one !
+  * fit with an Amdahl law where ''T=s+p/N'' where ''N'' is ''PR''
+  * fit with a Mylq law where ''T=s+c*N+p/N''
+  * what law match the best
+Examples of gnuplot bunch of commands to do the job. Adapt to your file and ''PR''...
+<code>
+Ta(x)=T1*(1-Pa+Pa/x)
+fit [x=1:16] Ta(x) 'PiMC_1_64.dat' using 1:4 via T1,Pa
+Tm(x)=Sm+Cm*x+Pm/x
+fit [x=1:16] Tm(x) 'PiMC_1_64.dat' using 1:4 via Sm,Cm,Pm
+set xlabel 'Parallel Rate'
+set xrange [1:64]
+set ylabel "Speedup Factor"
+set title "PiMC : parallel execution with Bash for distributed iterations"
+plot    'PiMC_1_64.dat' using ($1):(Tm(1)/$4) title 'Mesures' with points,\
+	 Tm(1)/Tm(x) title "Mylq Law" with lines,\
+ 	 Ta(1)/Ta(x) title "Amdahl Law" with lines
+</code>
-xGEMM
+{{ :formation:pimc_1_64.png?600 |}}
-NBody.py
+==== Other sample codes (used for courses) ====
-PiXPU.py
-Choose your prefered parallel code
+In folder ''/scratch/AstroSim2017'', you will find the following executables:
+  * ''PiXPU.py'' : Pi Monte Carlo Dart Dash in PyOpenCL
+  * ''NBody.py'' : N-Body in PyOpenCL
+  * ''xGEMM_DP_openblas'' : Matrix-Matrix multiplication with multithreaded OpenBLAS library in double precision
+  * ''xGEMM_SP_openblas'' : Matrix-Matrix multiplication with multithreaded OpenBLAS library in simple precision
+  * ''xGEMM_DP_clblas'' : Matrix-Matrix multiplication for OpenCL library in double precision
+  * ''xGEMM_SP_clblas'' : Matrix-Matrix multiplication for OpenCL library in simple precision
+  * ''xGEMM_DP_cublas'' : Matrix-Matrix multiplication for CUDA library in double precision
+  * ''xGEMM_SP_cublas'' : Matrix-Matrix multiplication for CUDA library in simple precision
-Improvment of statistics
+=== Exercice #19 : select parallelized program and explore salability ===
-Scalability law
+  * launch one of the upper code with ''PR'' from ''1'' to the 2 times the number of CPUs
+  * draw the scalability curve
+  * estimates the parameters with Amdahl Law and Mylq Law
-Amdahl Law
+==== Your prefered software ====
-Mylq Law
+=== Exercice #20 : select parallelized program and explore salability ===
+  * launch your MPI code with ''PR'' from ''1'' to the 2 times the number of CPUs
+  * draw the scalability curve
+  * estimates the parameters with Amdahl Law and Mylq Law
+ --- //[[emmanuel.quemener@ens-lyon.fr|Emmanuel Quemener]] 2017/06/30 14:26//

formation/astrosim2017para4dummies.txt · Dernière modification: 2017/07/07 09:25 par equemene

Rechercher

Translations

Piste:

Boîte à outils