Speed test using spinpack (under construction 2003/01/20)

Here you can find information, how to measure speed of your machine running spinpack.

First you have to download spinpack-2.15.tgz (or better version). Uncompress and untar the file, configure the Makefile, compile the sources and run the executable. Here is an example:

       gunzip -c spinpack-2.15.tgz | tar -xf -
       cd spinpack

       # --- small speed test --- (1CPU, MEM=113MB, DISK=840MB, nud=30,10)
       ./configure --nozlib
       make speed_test 
       sh -c "( cd ./exe; time ./spin ) 2>&1 | tee speed_test_small" 

       # --- big speed test --- (16CPUs, MEM=735MB, DISK=6GB, nud=28,12)
       ./configure --mpt --nozlib
       make speed_test; grep -v small exe/daten.i1 >exe/daten.i
       sh -c "( cd ./exe; time ./spin ) 2>&1 | tee speed_test_big"
   

Send me the output files together with the characteristic data of your computer for comparisions. Please also add the output of grep FLAGS= Makefile and cpu.log if you have.

Computation time

The next table gives an overview about computation time for a N=40 site system (used for speed test). First column marks the up- and down-spins (nud) given in daten.i. Other columns list the time needed for writing the matrix (SH) and for the first 40 iterations (i=40) showed by the output. The star (*) marks the default configuration when using make speed_test (see above). The double star is an example for the big speed_test (see above).

   nud     SH-time  i=40-time CPUs machine          time=[hh:]mm:ss(+-ss) dflt: v2.15 -O2
  -------+---------+---------+--+---------------------------------------
   32,8      3:32     9:20    1  Via-C3-1GHz-64k-gcc-3.3 v2.19 -O2 -msse -march=i586 lt=245s          (-T 255MB/s)
   32,8      1:25     3:10    1  Celeron-1GHz-gcc_2.95.3 v2.18 -O2                   lt=59s (rl: cache=256kB disk=26MB/s dskcache=168MB/s)
   32,8      0:53     1:56    1  Centrino-1.4GHz-gcc-3.3 v2.19 -O2 -msse -march=i586 lt=40s   3s/5It  (-T 986MB/s)
   32,8      2:02     4:10    1  Centrino-600MHz-gcc-3.3 v2.19 -O2 -msse -march=i586 lt=94s   4s/5It  (-T 858MB/s) speed-step
   32,8      2:11     4:18    1  Centrino-600MHz-gcc-3.3 v2.19 -O2 -msse -march=i686 lt=95s   4s/5It  (-T 858MB/s) speed-step
   32,8      0:50     2:04    1  Pentium4-2.5GHz-gcc-3.2 v2.17 -O2 -march=i686 -msse B_NL2=4
   32,8      1:03     2:21    1  AthlonXP-1.7GHz-gcc-3.2 v2.17 -O2 -march=athlon-xp -m3dnow (hda=55MB/s)
   32,8      1:14     4:32    1  Xeon-2GHz-v2.18-gcc-3.2 -O2 -march=i686 -msse 4x4 lt=1:00
   32,8      1:21     3:13    1  GS160-Alpha-731MHz-cxx v2.17 -fast -g3 -pg Compaq C++ V6.3-008
   32,8      0:59     2:16    1  ES45-Alpha-1250MHz-cxx-6.3 -fast v2.18 lt=0:59 2x2
   32,8      3:16     7:04    1  O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA  B_NL2=2  
   32,8      3:11     6:39    1  O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA  B_NL2=0  
   32,8      3:13     5:56    1  O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA -lpthread
   32,8      1:59     4:54    2  O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA -lpthread
   32,8      1:23     4:36    4  O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA -lpthread
   ----------------------------  n1=5.3e6
   30,10      23m      76m    1  Pentium-1.7GHz-gcc v2.15 -lgz
   30,10      21m      50m    1  Pentium-1.7GHz-gcc v2.15
   30,10    15:15    38:29    1  AthlonXP-1.7GHz-gcc-3.2 v2.17 -O2 -march=athlon-xp -m3dnow      (lt=7m29s hda=55MB/s, cat  40x800MB=15m, 40%idle)
   30,10    15:34    45:28    1  AthlonXP-1.7GHz-gcc-3.2 v2.18 -O2 -march=athlon-xp -m3dnow -lgz (lt=7m29s hda=55MB/s, zcat 40x450MB=13m,  1%idle)
   30,10    11:51    26:31    1  Pentium4-2.5GHz-gcc-3.2 v2.17 -O2 -march=i686 -msse B_NL2=4
   30,10    15:59    40:09    1  Xeon-2GHz-gcc-3.2 v2.17 -O2 -march=i686 -msse -g -pg
   30,10    14:26    34:08    1  Xeon-2GHz-gcc-3.2 v2.17 -O2 -march=i686 -msse
   30,10     8:16    25:09    2  Xeon-2GHz-gcc-3.2 v2.17 -O2 -march=i686 -msse (slow nfs-disk)
   30,10    14:40    32:26    1  Xeon-2GHz-gcc-3.2 v2.18 -O2 -march=i686 -msse 4x4 lt=10:34
   30,10     8:14    29:52    4  SunFire-880-SparcIII-750MHz-CC-5.3  v2.17 -fast -lz (sun4u) lt=4:14  4 threads
   30,10    19:31  1:03:28    1  SunFire-880-SparcIII-750MHz-CC-5.3  v2.17 -fast     (sun4u) lt=9:50 16 threads   2048s/40*168e6=0.30us
   30,10    27:28  1:14:28    1  SunFire-880-SparcIII-750MHz-g++2.95 v2.17 -mv8 -O2  (sun4u) lt=22:32 4 threads
   30,10     7:52    21:40    4  SunFire-880-SparcIII-750MHz-CC-5.3  v2.19 -fast     (sun4u) lt=6:11  4 threads  (55s/5It) vbuf=16M
   30,10     7:24    26:45    4  SunFire-880-SparcIII-750MHz-CC-5.3  v2.17 -fast     (sun4u) lt=4:11  4 threads  4*910s/40*168e6=0.54us
   30,10     7:12    26:28    4  SunFire-880-SparcIII-750MHz-CC-5.3  v2.17 -fast -O4 (sun4u) lt=4:05  4 threads  4*911s/40*168e6=0.54us
   30,10     3:44    16:58    8  SunFire-880-SparcIII-750MHz-CC-5.3  v2.17 -fast     (sun4u) lt=4:23 16 threads  8*532s/40*168e6=0.63us
   30,10        -        -    -  SunFire-880-SparcIII-750MHz-CC-5.3 -fast -xtarget=ultra -xarch=v9 (64bit)
   30,10    14:25    26:09    1  ES45-Alpha-1250MHz-gcc-3.2.3 -O2 v2.18 lt=6:09 2x2
   30,10    12:13    22:15    1  ES45-Alpha-1250MHz-cxx-6.3 -fast v2.18 lt=4:14 2x2 (ev56)
 * 30,10      24m      64m    1  GS160-Alpha-731MHz-cxx-6.3 v2.15
   30,10    19:00    48:14    1  GS160-Alpha-731MHz-cxx-6.3 v2.17 -fast -g3 -pg (42% geth_block, 27% b_smallest, 16% ifsmallest3)
   30,10    21:12    50:37    1  GS160-Alpha-731MHz-cxx-6.3 v2.17 -fast
   30,10    19:36    59:44    1  GS160-Alpha-731MHz-cxx-6.3 v2.17 -fast 16 threads
   30,10    12:15    36:16    2  GS160-Alpha-731MHz-cxx-6.3 v2.17 -fast 16 threads
   30,10     8:24    24:17    3  GS160-Alpha-731MHz-cxx-6.3 v2.17 -fast 16 threads
   30,10     8:21    25:24    3  GS160-Alpha-731MHz-cxx-6.3 v2.17 -fast -pthread 16 threads
   30,10     7:40    26:36    3  GS160-Alpha-731MHz-cxx-6.3 v2.17 -fast -pthread  4 threads
   30,10     7:48    53:00    10 GS160-Alpha-731MHz-cxx-6.3 v2.15
   30,10     3:50    18:23    16 GS160-Alpha-731MHz-cxx-6.3 v2.15 ( 64 threads)
   30,10     3:33    15:19    16 GS160-Alpha-731MHz-cxx-6.3 v2.15 (128 threads)
   30,10    21:20    43:55    1  GS160-Alpha-731MHz-cxx-6.3 v2.18 -fast lt=06:50 4m/10It
   30,10    19:44    46:16    2  GS160-Alpha-731MHz-cxx-6.3 v2.18 -fast lt=05:46 (1m59s..2m54s)/5It (work load, home)
   30,10    12:18    34:11 a2 2  GS160-Alpha-731MHz-cxx-6.3 v2.18 -fast lt=05:38 (20s..22s)/a2It (work load, home, a2=53It/34m)
   30,10     5:35    12:41    16 GS160-Alpha-731MHz-cxx-6.3 v2.18 -fast lt=02:51  1m/10It  (640%CPU)
   30,10    12:55    23:15    1  GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=04:33 1m26s/10It = 10*840MB/1m26s=98MB/s                  10*hnz/86s/1=20e6eps/cpu 50ns (max.80ns)
   30,10     2:19     5:59    8  GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=02:11 22s/10It (14%user+5%sys+81%idle (0%dsk) von 32CPUs) 10*hnz/22s/8=10e6eps/cpu
   30,10     1:38     4:11    16 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=01:47 12s/10It
   30,10     1:46     3:48    32 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=01:25  9s/10It
   30,10  1:01:10  4:25:28    1  O2100-IP27-250MHz-CC-7.30 v2.15 -O3 -lz
   30,10    50:06  3:12:22    1  O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA -lz
   30,10    30:14  2:00:42    2  O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA -lz
   30,10    41:50  1:35:44    1  O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA
   30,10    54:00  1:45:15    1  O2100-IP27-250MHz-CC-7.30 v2.17v3 ssrun -64 -O2 -IPA lt=00:20:33 geth_bl=2200s latency?=2030s/60*168e6=0.20us (XY_NEW+sortH)
   30,10    47:06  1:36:56    1  O2100-IP27-250MHz-CC-7.30 v2.17v3 ssrun -64 -O2 -IPA lt=00:20:40 geth_bl=2090s latency?=1928s/60*168e6=0.19us (XY_NEW)
   30,10    26:52  1:14:28    2  O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA (HBLen=1024 about same)
   30,10    16:50  1:13:51    8  O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA
   30,10    19:23  1:16:17    4  O2100-IP27-250MHz-CC-7.30 v2.18 -64 -O2         4x4 hnz+15% lt=00:11:33
   30,10    19:08  0:59:29    4  O2100-IP27-250MHz-CC-7.30 v2.18 -64 -Ofast -IPA 4x4 hnz+15% lt=00:13:00
   30,10    44:22  2:11:25    2  MIPS--IP25-194MHz-CC-7.21 v2.19 -64 -Ofast -IPA 2x2 lt=23m (8m00s)/5It CFLOAT
   30,10    44:14  2:12:54 a2 2  MIPS--IP25-194MHz-CC-7.21 v2.19 -64 -Ofast -IPA 2x2 lt=25m (2m05s)/a2It CFLOAT i45=2h23m
   30,10    22:08  2:03:54    4  MIPS--IP25-194MHz-CC-7.21 v2.19 -64 -Ofast -IPA                   4x4 lt=47m (6m49s)/5It 
   30,10    22:04  2:12:33 a2 4  MIPS--IP25-194MHz-CC-7.21 v2.19 -64 -Ofast -IPA                   4x4 lt=47m (2m09s)/a2It  i51=2h36m
   30,10    23:13  1:45:24    4  MIPS--IP25-194MHz-gcc-323 v2.19 -O2 -mips4 -mabi=64 -mcpu=orion   4x4 lt=21m (7m35s)/5It  read=20k 
   30,10    20:22  1:32:46    4  MIPS--IP25-194MHz-CC-7.30 v2.19 -64 -Ofast -IPA                   4x4 lt=14m (7m15s)/5It 
   ----------------------------  n1=35e6
   28,12 10:40:39 20:14:07    1  MIPS--IP25-194MHz-CC-7.21 v2.18 -64 -Ofast -IPA 1x1 lt=2h51m 50m/5It (dd_301720*20k=354s dd*5=30m /tmp1 cat=6GB/352s=17MB/s)
   28,12  6:04:55 16:03:42    2  MIPS--IP25-194MHz-CC-7.21 v2.18 -64 -Ofast -IPA 2x2 lt=3h00m 52m/5It (ToDo: check time-diffs It0..It20?)
   28,12  5:41:14 17:33:04    2  MIPS--IP25-194MHz-CC-7.21 v2.18 -64 -Ofast -IPA 2x2 lt=2h54m 58m/5It      FLOAT npri=40  ts=100 MaxSym=170 
   28,12  6:20:39 19:30:18?   2  MIPS--IP25-194MHz-CC-7.30 v2.18 -64 -Ofast -IPA 2x2 lt=1h51m 85m/5It ZLIB FLOAT npri=40? ts=100 MaxSym=170 dplace -data_pagesize 64k, read=4096 (gunzip=3.2GB/1110s=5.4MB/s t*5=47m /tmp1 2gunzip=3.2GB/570s=11MB/s)
   28,12  5:42:37 14:06:01    2  MIPS--IP25-194MHz-CC-7.30 v2.18 -64 -Ofast -IPA 2x2 lt=1h49m 49m/5It      FLOAT npri=40  ts=100 MaxSym=170 dplace -data_pagesize 64k, write=20480 read=? (2cat=6GB/451s 5*6GB=38m)
   28,12  5:40:49 14:05:49    2  MIPS--IP25-194MHz-CC-7.30 v2.18 -64 -Ofast -IPA 2x2 lt=1h49m 49m/5It      FLOAT npri=40 MaxSym=170, write=20480 read=? (2cat=6GB/451s 5*6GB=38m)
   28,12  3:14:01 10:09:10    4  MIPS--IP25-194MHz-CC-7.30 v2.18 -64 -Ofast -IPA 4x4 lt=1h26m 41m/5It      FLOAT npri=40 (was resetted?) (4cat=6.6GB/469s)
   28,12  3:25:09 11:04:01    4  MIPS--IP25-194MHz-CC-7.30 v2.18 -64 -Ofast -IPA 4x4 lt=1h32m 45m46s/5It  CFLOAT npri=40
   28,12  3:23:50 14:14:25    4  MIPS--IP25-194MHz-CC-7.21 v2.18 -64 -Ofast -IPA 4x4 lt=5h07m 42m/5It (parallel_cat=6GB/435s 5It=36m)
   28,12  3:43:02 13:31:30    4  MIPS--IP25-194MHz-gcc-323 v2.19 -O2 -mips4 -mabi=64 -mcpu=orion   4x4 lt=2h11m (57m)/5It  read=20k 
   28,12  3:42:38 13:07:44 a2 4  MIPS--IP25-194MHz-gcc-323 v2.19 -O2 -mips4 -mabi=64 -mcpu=orion   4x4 lt=2h22m (16m)/a2It  read=20k  i55=17h
   28,12  3:14:22 12:28:31    4  MIPS--IP25-194MHz-CC-7.30 v2.19 -64 -Ofast -IPA                   4x4 lt=1h25m (59m)/5It 
   28,12  3:15:36 12:00:46 a2 4  MIPS--IP25-194MHz-CC-7.30 v2.19 -64 -Ofast -IPA                   4x4 lt=1h25m (16m)/a2It            i54=15h51m
   28,12      24h      40h    1  O2100-IP27-250MHz-CC-7.30 v1.4
   28,12  3:42:27 10:32:52    2  O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA
   28,12     171m       7h    1  Pentium-1.7GHz-gcc   v2.15
   28,12       5h      10h    1  GS160-Alpha-731MHz-cxx-6.3 v2.15
** 28,12    57:39  5:29:57    16 GS160-Alpha-731MHz-cxx-6.3 v2.15 (16  threads)
   28,12    59:22  2:51:54    16 GS160-Alpha-731MHz-cxx-6.3 v2.15 (128 threads) .
   28,12  3:03:00 10:04:03    1  GS160-Alpha-731MHz-cxx-6.3 v2.17pre -fast
   28,12  1:13:27  5:45:12    3  GS160-Alpha-731MHz-cxx-6.3 v2.17 -fast -pthread 16
   28,12  1:49:31  4:29:09    4  GS160-Alpha-731MHz-cxx-6.3 v2.18 -fast lt=25m home 10It/32..77m 7.5GB/635s=12MB/s(392s,254s,81s) tmp3=160s,40s,33s tmp3_parallel=166s,138s
   28,12    52:57  2:17:00    8  GS160-Alpha-731MHz-cxx-6.5 v2.19 -fast lt=24m  13m30s/10It 
   28,12    42:19   :  :      16 GS160-Alpha-731MHz-cxx-6.3 v2.18 -fast lt=17m home  2It/6..10m  7.5GB/45s=166MB/s
   28,12  2:00:56  4:08:31    1  GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=53:23  17m17s/10It = 10*6GB/17m17s=58MB/s (3GB_local+3GB_far)          12e6eps/cpu
   28,12  1:12:02  2:18:26    2  GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=20:39  11m08s/10It = 10*6GB/11m08s=90MB/s                               9e6eps/cpu
   28,12    40:36  1:21:40    4  GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=16:06   6m13s/10It
   28,12    23:20    50:20    8  GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=13:08   3m26s/10It
   28,12    21:35    53:10    8  GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=13:13   (2m04s..4m41s)/10It HBlen=409600 10*6GB/2m=492MB/s hnz*10/2m/8=10e6eps/cpu
   28,12    14:01    32:17    16 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=10:46   1m51s/10It
   28,12    13:09    27:50    32 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=08:37   1m29s/10It
   28,12    15:41    30:57    32 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=08:42   1m24s/10It 70%user+7%sys+23%idle(0%io) 1user                    4e6eps/cpu
   28,12  3:19:39  7:02:48    1  SunFire-880-SparcIII-750MHz-CC-5.3  v2.18 -fast     (sun4u) lt=1h05m  1 threads (19m40s/5It)
   28,12  1:48:28  4:29:24    2  SunFire-880-SparcIII-750MHz-CC-5.3  v2.18 -fast     (sun4u) lt=47:17  2 threads (14m08s/5It)
   28,12    58:41  2:42:08    4  SunFire-880-SparcIII-750MHz-CC-5.3  v2.18 -fast     (sun4u) lt=36:36  4 threads (8m/5It, 4cat=6GB/0.5s)
   28,12  1:00:27  2:44:18    4  SunFire-880-SparcIII-750MHz-CC-5.3  v2.18 -fast     (sun4u) lt=35:46  4 threads (8m19s/5It) 2nd try v2.19
   28,12  1:00:45  2:38:41 a2 4  SunFire-880-SparcIII-750MHz-CC-5.3  v2.18 -fast     (sun4u) lt=39:25  4 threads (1m57s/1a2) 2nd try v2.19a2 i51=3h incl. EV
   28,12    59:16  2:37:09    4  SunFire-880-SparcIII-750MHz-CC-5.3  v2.18 -fast     (sun4u) lt=38:48  4 threads (7m17s/5It) FLOAT
   28,12    35:59  1:47:38    8  SunFire-880-SparcIII-750MHz-CC-5.3  v2.18 -fast     (sun4u) lt=29:39  8 threads (5m/5It)
   ----------------------------
   27,13  7:41:55 29:08:50    4  MIPS--IP25-194MHz-CC-7.30 v2.19 -64 -Ofast -IPA                   4x4 lt=3h10m (2h16m)/5It 
   27,13  2:21:06  6:24:27    4  SunFire-880-SparcIII-750MHz-CC-5.3  v2.19 -fast -xtarget=ultra -xarch=v9 -g -xipo -xO5 lt=73:15  (21m14s/5It)
   27,13  1:48:39 10:35:00    8  GS160-Alpha-731MHz-cxx-6.5   v2.19 -fast  lt=45m  56m/5It 
   27,13    54:36 14:23:16    8  GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast  lt=27m    (13m..1h32m)/5It
   27,13    57:59  2:18:59    8  GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast  lt=28m    (6m38s)/5It   HBLen=409600
   27,13    32:15  4:26:40    16 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast  lt=22:35  (4m43s..1h38m)/10It
   27,13    46:03  1:43:21    16 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast  lt=25:44  (3m50s..3m57s)/5It mfs-disk + vbuf=16MB + sah's
   27,13    29:18  1:01:25    32 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast  lt=18:03  (1m43s..1h38m)/5It  (2stripe-Platte=150MB/s) 60%user+5%sys+35%idle(0%disk) 1user
   ----------------------------
   26,14     107h     212h    1  O2100-IP27-250MHz-CC-7.30 v1.4
   25,15  6:21:08 22:13:42    4  ES45-Alpha-1GHz-CC-6.5 -fast -lz v2.17    lt=01:31:58
   26,14  3:30:24 12:09:12    4  ES45-Alpha-1GHz-CC-6.5 -fast -lz v2.17    lt=01:02:21
   26,14    45:51  1:45:48    16 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast  lt=27:07  (4m00s...4m12s)/5It mfs-disk vbuf=16M + spike (optimization after linking)
   26,14    47:18  1:50:45    16 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast  lt=27:08  (3m48s...5m16s)/5It mfs-disk + spike (optimization after linking)
   26,14  1:31:49  3:31:01    16 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast  lt=48:34  (7m57s..10m01s)/5It mfs-disk vbuf=16M
   26,14  1:08:45 16:19:00    32 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast  lt=33:17  (30m..5h)/10It HBLen=409600
   25,15  3:03:41 15:54:51    8  GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast  lt=1h24m  (1h18m..1h26m)/5It HBLen=409600
   24,16 04:58:56 25:21:48    8  GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast  lt=2h08m  (2h16m)/5It    HBLen=409600
   24,16 10:17:31   :  :      4  ES45-Alpha-1GHz-CC-6.5 -fast -lz v2.17    lt=02:41:03 99%stored (+4h i40ca54h)
   23,17 17:19:51 51:02:31    4  ES45-Alpha-1GHz-CC-6.5 -fast -lz v2.18    lt=04:11:14 latency=29h30m/40*63GB=42ns cat=2229s(28MB/s) zcat=5906s(11MB/s)

Next figure shows the computing time for different older program versions and computers (I update it as soon as I can). The computing time depends nearly linearly from the matrix size n1 (time is proportional to n1^1.07, n1 is named n in the figure).

4kB png image of computing time

memory and disk usage

Memory usage depends from the matrix dimension n1. For the N=40 sample two double vectors and one 5-byte vector is stored in the memory, so we need n1*21 Bytes, where n1 is approximatly (N!/(nu!*nd!))/(4N). Disk usage is mainly the number of nonzero matrix elements hnz times 5 (disk size for tmp_l1.dat is 5*n1 and is not included here). The number of nonzero matrix elements hnz depends from n1 by hnz=11.5(10)*n1^1.064(4), which was found empirically. Here are some examples:

  nu,nd     n1   memory     hnz    disk  (zip)  (n1*21=memory, hnz*5=disk)
  -----+---------------+----------------------
  34,6    24e3    432kB   526e3   2.6MB 1.3MB 
  32,8   482e3     11MB    13e6    66MB  34MB
  30,10  5.3e6    113MB   168e6   840MB 444MB   small speed test
  28,12   35e6    735MB   1.2e9     6GB 3.6GB   big speed test
  27,13   75e6    1.4GB   2.8e9    14GB         # n1=75214468
  26,14  145e6    2.6GB   5.5e9    28GB 
  25,15  251e6    5.3GB   9.9e9    50GB
  24,16  393e6    8.3GB  15.8e9    79GB
  23,17  555e6   11.7GB    23e9   115GB  63GB
  22,18  708e6   14.9GB    ...     ...
  20,20  431e6    7.8GB    18e9    90GB 
   

CPU load

A typical cpu load for a N=40 site system looks like this:

4kB png image of cpu-load

Data are generated using the following tiny script:

   #!/bin/sh
   while ps -o pid,pcpu,time,etime,cpu,user,args -p 115877;\
     do sleep 30; done | grep -v CPU
   

115877 is the PID of the process. You have to replace it. Alternativly you can activate a script activated by daten.i (edit it). The machine was used by 5 users, therefore peak load is only about 12CPUs. 735MB memory and 6GB diskspace were used. You can see the initialization process (20min), the matrix generation (57min) and the first 4 iterations (4x8min). The matrix generation is most dependend from CPU power. The iteration time mainly depends from the disk speed (try: time cat exe/tmp/ht* >/dev/null) and the speed of random memory access. For example a GS1280-1GHz needs a bandwith to the disk of 60MB/s per CPU to avoid a bottle neck. You can improve disk speed using striped disks or files (AdvFS) and putting every H-block on another disk. The maximum number of threads was limited to 16, but this can be changed (see src/config.h).

Why multi-processor scaling is so bad for v2.15?

During iterations the multi-processor scaling is so bad on most machines -- why? I guess, this is because of random read access to the vector a (see picture below). I thought a shared memory computer should not have such problems with scaling here, but probably I am wrong. In future I try to solve the problem.

6kB png image of dataflow Figure shows dataflow during iterations for 2 CPUs.