Here you can find information, how to measure speed of your machine running spinpack.
First you have to download spinpack-2.15.tgz (or better version). Uncompress and untar the file, configure the Makefile, compile the sources and run the executable. Here is an example:
gunzip -c spinpack-2.15.tgz | tar -xf -
cd spinpack
# --- small speed test --- (1CPU, MEM=113MB, DISK=840MB, nud=30,10)
./configure --nozlib
make speed_test
sh -c "( cd ./exe; time ./spin ) 2>&1 | tee speed_test_small"
# --- big speed test --- (16CPUs, MEM=735MB, DISK=6GB, nud=28,12)
./configure --mpt --nozlib
make speed_test; grep -v small exe/daten.i1 >exe/daten.i
sh -c "( cd ./exe; time ./spin ) 2>&1 | tee speed_test_big"
Send me the output files together with the characteristic data
of your computer for comparisions. Please also add the output of
grep FLAGS= Makefile and cpu.log if you have.
The next table gives an overview about computation time for a N=40 site system (used for speed test). First column marks the up- and down-spins (nud) given in daten.i. Other columns list the time needed for writing the matrix (SH) and for the first 40 iterations (i=40) showed by the output. The star (*) marks the default configuration when using make speed_test (see above). The double star is an example for the big speed_test (see above).
nud SH-time i=40-time CPUs machine time=[hh:]mm:ss(+-ss) dflt: v2.15 -O2 -------+---------+---------+--+--------------------------------------- 32,8 3:32 9:20 1 Via-C3-1GHz-64k-gcc-3.3 v2.19 -O2 -msse -march=i586 lt=245s (-T 255MB/s) 32,8 1:25 3:10 1 Celeron-1GHz-gcc_2.95.3 v2.18 -O2 lt=59s (rl: cache=256kB disk=26MB/s dskcache=168MB/s) 32,8 0:53 1:56 1 Centrino-1.4GHz-gcc-3.3 v2.19 -O2 -msse -march=i586 lt=40s 3s/5It (-T 986MB/s) 32,8 2:02 4:10 1 Centrino-600MHz-gcc-3.3 v2.19 -O2 -msse -march=i586 lt=94s 4s/5It (-T 858MB/s) speed-step 32,8 2:11 4:18 1 Centrino-600MHz-gcc-3.3 v2.19 -O2 -msse -march=i686 lt=95s 4s/5It (-T 858MB/s) speed-step 32,8 0:50 2:04 1 Pentium4-2.5GHz-gcc-3.2 v2.17 -O2 -march=i686 -msse B_NL2=4 32,8 1:03 2:21 1 AthlonXP-1.7GHz-gcc-3.2 v2.17 -O2 -march=athlon-xp -m3dnow ( hda=55MB/s) 32,8 0:55 2:21 1 AthlonXP-1.7GHz-gcc-3.3 v2.21 -O4 -march=athlon-xp -m3dnow lt=39s 6s/5It (-T 408MB/s hda=48MB/s) i65=2m32s+18s 66MB/48MB*s*65=89s 32,8 1:14 4:32 1 Xeon-2GHz-v2.18-gcc-3.2 -O2 -march=i686 -msse 4x4 lt=1:00 32,8 1:21 3:13 1 GS160-Alpha-731MHz-cxx v2.17 -fast -g3 -pg Compaq C++ V6.3-008 32,8 0:59 2:16 1 ES45-Alpha-1250MHz-cxx-6.3 -fast v2.18 lt=0:59 2x2 32,8 3:16 7:04 1 O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA B_NL2=2 32,8 3:11 6:39 1 O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA B_NL2=0 32,8 3:13 5:56 1 O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA -lpthread 32,8 1:59 4:54 2 O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA -lpthread 32,8 1:23 4:36 4 O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA -lpthread ---------------------------- n1=5.3e6 30,10 23m 76m 1 Pentium-1.7GHz-gcc v2.15 -lgz 30,10 21m 50m 1 Pentium-1.7GHz-gcc v2.15 30,10 12:58 44:01 1 AthlonXP-1.7GHz-gcc-3.3 v2.21 -O4 -march=athlon-xp -m3dnow (lt=6m44s hda=48MB/s, cat 40x800MB=15m, 48%idle) 3m/5It i65:r=60m,u=34m,s=4m (also pthread) 30,10 15:15 38:29 1 AthlonXP-1.7GHz-gcc-3.2 v2.17 -O2 -march=athlon-xp -m3dnow (lt=7m29s hda=55MB/s, cat 40x800MB=15m, 40%idle) 30,10 15:34 45:28 1 AthlonXP-1.7GHz-gcc-3.2 v2.18 -O2 -march=athlon-xp -m3dnow -lgz (lt=7m29s hda=55MB/s, zcat 40x450MB=13m, 1%idle) 30,10 11:51 26:31 1 Pentium4-2.5GHz-gcc-3.2 v2.17 -O2 -march=i686 -msse B_NL2=4 30,10 15:59 40:09 1 Xeon-2GHz-gcc-3.2 v2.17 -O2 -march=i686 -msse -g -pg 30,10 14:26 34:08 1 Xeon-2GHz-gcc-3.2 v2.17 -O2 -march=i686 -msse 30,10 8:16 25:09 2 Xeon-2GHz-gcc-3.2 v2.17 -O2 -march=i686 -msse (slow nfs-disk) 30,10 14:40 32:26 1 Xeon-2GHz-gcc-3.2 v2.18 -O2 -march=i686 -msse 4x4 lt=10:34 30,10 8:14 29:52 4 SunFire-880-SparcIII-750MHz-CC-5.3 v2.17 -fast -lz (sun4u) lt=4:14 4 threads 30,10 19:31 1:03:28 1 SunFire-880-SparcIII-750MHz-CC-5.3 v2.17 -fast (sun4u) lt=9:50 16 threads 2048s/40*168e6=0.30us 30,10 27:28 1:14:28 1 SunFire-880-SparcIII-750MHz-g++2.95 v2.17 -mv8 -O2 (sun4u) lt=22:32 4 threads 30,10 7:52 21:40 4 SunFire-880-SparcIII-750MHz-CC-5.3 v2.19 -fast (sun4u) lt=6:11 4 threads (55s/5It) vbuf=16M 30,10 7:24 26:45 4 SunFire-880-SparcIII-750MHz-CC-5.3 v2.17 -fast (sun4u) lt=4:11 4 threads 4*910s/40*168e6=0.54us 30,10 7:12 26:28 4 SunFire-880-SparcIII-750MHz-CC-5.3 v2.17 -fast -O4 (sun4u) lt=4:05 4 threads 4*911s/40*168e6=0.54us 30,10 3:44 16:58 8 SunFire-880-SparcIII-750MHz-CC-5.3 v2.17 -fast (sun4u) lt=4:23 16 threads 8*532s/40*168e6=0.63us 30,10 - - - SunFire-880-SparcIII-750MHz-CC-5.3 -fast -xtarget=ultra -xarch=v9 (64bit) 30,10 14:25 26:09 1 ES45-Alpha-1250MHz-gcc-3.2.3 -O2 v2.18 lt=6:09 2x2 30,10 12:13 22:15 1 ES45-Alpha-1250MHz-cxx-6.3 -fast v2.18 lt=4:14 2x2 (ev56) * 30,10 24m 64m 1 GS160-Alpha-731MHz-cxx-6.3 v2.15 30,10 19:00 48:14 1 GS160-Alpha-731MHz-cxx-6.3 v2.17 -fast -g3 -pg (42% geth_block, 27% b_smallest, 16% ifsmallest3) 30,10 21:12 50:37 1 GS160-Alpha-731MHz-cxx-6.3 v2.17 -fast 30,10 19:36 59:44 1 GS160-Alpha-731MHz-cxx-6.3 v2.17 -fast 16 threads 30,10 12:15 36:16 2 GS160-Alpha-731MHz-cxx-6.3 v2.17 -fast 16 threads 30,10 8:24 24:17 3 GS160-Alpha-731MHz-cxx-6.3 v2.17 -fast 16 threads 30,10 8:21 25:24 3 GS160-Alpha-731MHz-cxx-6.3 v2.17 -fast -pthread 16 threads 30,10 7:40 26:36 3 GS160-Alpha-731MHz-cxx-6.3 v2.17 -fast -pthread 4 threads 30,10 7:48 53:00 10 GS160-Alpha-731MHz-cxx-6.3 v2.15 30,10 3:50 18:23 16 GS160-Alpha-731MHz-cxx-6.3 v2.15 ( 64 threads) 30,10 3:33 15:19 16 GS160-Alpha-731MHz-cxx-6.3 v2.15 (128 threads) 30,10 21:20 43:55 1 GS160-Alpha-731MHz-cxx-6.3 v2.18 -fast lt=06:50 4m/10It 30,10 19:44 46:16 2 GS160-Alpha-731MHz-cxx-6.3 v2.18 -fast lt=05:46 (1m59s..2m54s)/5It (work load, home) 30,10 12:18 34:11 a2 2 GS160-Alpha-731MHz-cxx-6.3 v2.18 -fast lt=05:38 (20s..22s)/a2It (work load, home, a2=53It/34m) 30,10 5:35 12:41 16 GS160-Alpha-731MHz-cxx-6.3 v2.18 -fast lt=02:51 1m/10It (640%CPU) 30,10 12:55 23:15 1 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=04:33 1m26s/10It = 10*840MB/1m26s=98MB/s 10*hnz/86s/1=20e6eps/cpu 50ns (max.80ns) 30,10 2:19 5:59 8 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=02:11 22s/10It (14%user+5%sys+81%idle (0%dsk) von 32CPUs) 10*hnz/22s/8=10e6eps/cpu 30,10 1:38 4:11 16 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=01:47 12s/10It 30,10 1:46 3:48 32 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=01:25 9s/10It 30,10 1:01:10 4:25:28 1 O2100-IP27-250MHz-CC-7.30 v2.15 -O3 -lz 30,10 50:06 3:12:22 1 O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA -lz 30,10 30:14 2:00:42 2 O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA -lz 30,10 41:50 1:35:44 1 O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA 30,10 54:00 1:45:15 1 O2100-IP27-250MHz-CC-7.30 v2.17v3 ssrun -64 -O2 -IPA lt=00:20:33 geth_bl=2200s latency?=2030s/60*168e6=0.20us (XY_NEW+sortH) 30,10 47:06 1:36:56 1 O2100-IP27-250MHz-CC-7.30 v2.17v3 ssrun -64 -O2 -IPA lt=00:20:40 geth_bl=2090s latency?=1928s/60*168e6=0.19us (XY_NEW) 30,10 26:52 1:14:28 2 O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA (HBLen=1024 about same) 30,10 16:50 1:13:51 8 O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA 30,10 19:23 1:16:17 4 O2100-IP27-250MHz-CC-7.30 v2.18 -64 -O2 4x4 hnz+15% lt=00:11:33 30,10 19:08 0:59:29 4 O2100-IP27-250MHz-CC-7.30 v2.18 -64 -Ofast -IPA 4x4 hnz+15% lt=00:13:00 30,10 44:22 2:11:25 2 MIPS--IP25-194MHz-CC-7.21 v2.19 -64 -Ofast -IPA 2x2 lt=23m (8m00s)/5It CFLOAT 30,10 44:14 2:12:54 a2 2 MIPS--IP25-194MHz-CC-7.21 v2.19 -64 -Ofast -IPA 2x2 lt=25m (2m05s)/a2It CFLOAT i45=2h23m 30,10 22:08 2:03:54 4 MIPS--IP25-194MHz-CC-7.21 v2.19 -64 -Ofast -IPA 4x4 lt=47m (6m49s)/5It 30,10 22:04 2:12:33 a2 4 MIPS--IP25-194MHz-CC-7.21 v2.19 -64 -Ofast -IPA 4x4 lt=47m (2m09s)/a2It i51=2h36m 30,10 23:13 1:45:24 4 MIPS--IP25-194MHz-gcc-323 v2.19 -O2 -mips4 -mabi=64 -mcpu=orion 4x4 lt=21m (7m35s)/5It read=20k 30,10 20:22 1:32:46 4 MIPS--IP25-194MHz-CC-7.30 v2.19 -64 -Ofast -IPA 4x4 lt=14m (7m15s)/5It ---------------------------- n1=35e6 28,12 10:40:39 20:14:07 1 MIPS--IP25-194MHz-CC-7.21 v2.18 -64 -Ofast -IPA 1x1 lt=2h51m 50m/5It (dd_301720*20k=354s dd*5=30m /tmp1 cat=6GB/352s=17MB/s) 28,12 6:04:55 16:03:42 2 MIPS--IP25-194MHz-CC-7.21 v2.18 -64 -Ofast -IPA 2x2 lt=3h00m 52m/5It (ToDo: check time-diffs It0..It20?) 28,12 5:41:14 17:33:04 2 MIPS--IP25-194MHz-CC-7.21 v2.18 -64 -Ofast -IPA 2x2 lt=2h54m 58m/5It FLOAT npri=40 ts=100 MaxSym=170 28,12 6:20:39 19:30:18? 2 MIPS--IP25-194MHz-CC-7.30 v2.18 -64 -Ofast -IPA 2x2 lt=1h51m 85m/5It ZLIB FLOAT npri=40? ts=100 MaxSym=170 dplace -data_pagesize 64k, read=4096 (gunzip=3.2GB/1110s=5.4MB/s t*5=47m /tmp1 2gunzip=3.2GB/570s=11MB/s) 28,12 5:42:37 14:06:01 2 MIPS--IP25-194MHz-CC-7.30 v2.18 -64 -Ofast -IPA 2x2 lt=1h49m 49m/5It FLOAT npri=40 ts=100 MaxSym=170 dplace -data_pagesize 64k, write=20480 read=? (2cat=6GB/451s 5*6GB=38m) 28,12 5:40:49 14:05:49 2 MIPS--IP25-194MHz-CC-7.30 v2.18 -64 -Ofast -IPA 2x2 lt=1h49m 49m/5It FLOAT npri=40 MaxSym=170, write=20480 read=? (2cat=6GB/451s 5*6GB=38m) 28,12 3:14:01 10:09:10 4 MIPS--IP25-194MHz-CC-7.30 v2.18 -64 -Ofast -IPA 4x4 lt=1h26m 41m/5It FLOAT npri=40 (was resetted?) (4cat=6.6GB/469s) 28,12 3:25:09 11:04:01 4 MIPS--IP25-194MHz-CC-7.30 v2.18 -64 -Ofast -IPA 4x4 lt=1h32m 45m46s/5It CFLOAT npri=40 28,12 3:23:50 14:14:25 4 MIPS--IP25-194MHz-CC-7.21 v2.18 -64 -Ofast -IPA 4x4 lt=5h07m 42m/5It (parallel_cat=6GB/435s 5It=36m) 28,12 3:43:02 13:31:30 4 MIPS--IP25-194MHz-gcc-323 v2.19 -O2 -mips4 -mabi=64 -mcpu=orion 4x4 lt=2h11m (57m)/5It read=20k 28,12 3:42:38 13:07:44 a2 4 MIPS--IP25-194MHz-gcc-323 v2.19 -O2 -mips4 -mabi=64 -mcpu=orion 4x4 lt=2h22m (16m)/a2It read=20k i55=17h 28,12 3:14:22 12:28:31 4 MIPS--IP25-194MHz-CC-7.30 v2.19 -64 -Ofast -IPA 4x4 lt=1h25m (59m)/5It 28,12 3:15:36 12:00:46 a2 4 MIPS--IP25-194MHz-CC-7.30 v2.19 -64 -Ofast -IPA 4x4 lt=1h25m (16m)/a2It i54=15h51m 28,12 24h 40h 1 O2100-IP27-250MHz-CC-7.30 v1.4 28,12 3:42:27 10:32:52 2 O2100-IP27-250MHz-CC-7.30 v2.17 -64 -Ofast -IPA 28,12 171m 7h 1 Pentium-1.7GHz-gcc v2.15 28,12 5h 10h 1 GS160-Alpha-731MHz-cxx-6.3 v2.15 ** 28,12 57:39 5:29:57 16 GS160-Alpha-731MHz-cxx-6.3 v2.15 (16 threads) 28,12 59:22 2:51:54 16 GS160-Alpha-731MHz-cxx-6.3 v2.15 (128 threads) . 28,12 3:03:00 10:04:03 1 GS160-Alpha-731MHz-cxx-6.3 v2.17pre -fast 28,12 1:13:27 5:45:12 3 GS160-Alpha-731MHz-cxx-6.3 v2.17 -fast -pthread 16 28,12 1:49:31 4:29:09 4 GS160-Alpha-731MHz-cxx-6.3 v2.18 -fast lt=25m home 10It/32..77m 7.5GB/635s=12MB/s(392s,254s,81s) tmp3=160s,40s,33s tmp3_parallel=166s,138s 28,12 52:57 2:17:00 8 GS160-Alpha-731MHz-cxx-6.5 v2.19 -fast lt=24m 13m30s/10It 28,12 42:19 : : 16 GS160-Alpha-731MHz-cxx-6.3 v2.18 -fast lt=17m home 2It/6..10m 7.5GB/45s=166MB/s 28,12 2:00:56 4:08:31 1 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=53:23 17m17s/10It = 10*6GB/17m17s=58MB/s (3GB_local+3GB_far) 12e6eps/cpu 28,12 1:12:02 2:18:26 2 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=20:39 11m08s/10It = 10*6GB/11m08s=90MB/s 9e6eps/cpu 28,12 40:36 1:21:40 4 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=16:06 6m13s/10It 28,12 23:20 50:20 8 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=13:08 3m26s/10It 28,12 21:35 53:10 8 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=13:13 (2m04s..4m41s)/10It HBlen=409600 10*6GB/2m=492MB/s hnz*10/2m/8=10e6eps/cpu 28,12 14:01 32:17 16 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=10:46 1m51s/10It 28,12 13:09 27:50 32 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=08:37 1m29s/10It 28,12 15:41 30:57 32 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=08:42 1m24s/10It 70%user+7%sys+23%idle(0%io) 1user 4e6eps/cpu 28,12 3:19:39 7:02:48 1 SunFire-880-SparcIII-750MHz-CC-5.3 v2.18 -fast (sun4u) lt=1h05m 1 threads (19m40s/5It) 28,12 1:48:28 4:29:24 2 SunFire-880-SparcIII-750MHz-CC-5.3 v2.18 -fast (sun4u) lt=47:17 2 threads (14m08s/5It) 28,12 58:41 2:42:08 4 SunFire-880-SparcIII-750MHz-CC-5.3 v2.18 -fast (sun4u) lt=36:36 4 threads (8m/5It, 4cat=6GB/0.5s) 28,12 1:00:27 2:44:18 4 SunFire-880-SparcIII-750MHz-CC-5.3 v2.18 -fast (sun4u) lt=35:46 4 threads (8m19s/5It) 2nd try v2.19 28,12 1:00:45 2:38:41 a2 4 SunFire-880-SparcIII-750MHz-CC-5.3 v2.18 -fast (sun4u) lt=39:25 4 threads (1m57s/1a2) 2nd try v2.19a2 i51=3h incl. EV 28,12 59:16 2:37:09 4 SunFire-880-SparcIII-750MHz-CC-5.3 v2.18 -fast (sun4u) lt=38:48 4 threads (7m17s/5It) FLOAT 28,12 35:59 1:47:38 8 SunFire-880-SparcIII-750MHz-CC-5.3 v2.18 -fast (sun4u) lt=29:39 8 threads (5m/5It) ---------------------------- 27,13 7:41:55 29:08:50 4 MIPS--IP25-194MHz-CC-7.30 v2.19 -64 -Ofast -IPA 4x4 lt=3h10m (2h16m)/5It 27,13 2:21:06 6:24:27 4 SunFire-880-SparcIII-750MHz-CC-5.3 v2.19 -fast -xtarget=ultra -xarch=v9 -g -xipo -xO5 lt=73:15 (21m14s/5It) 27,13 1:48:39 10:35:00 8 GS160-Alpha-731MHz-cxx-6.5 v2.19 -fast lt=45m 56m/5It 27,13 54:36 14:23:16 8 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=27m (13m..1h32m)/5It 27,13 57:59 2:18:59 8 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=28m (6m38s)/5It HBLen=409600 27,13 32:15 4:26:40 16 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=22:35 (4m43s..1h38m)/10It 27,13 46:03 1:43:21 16 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=25:44 (3m50s..3m57s)/5It mfs-disk + vbuf=16MB + sah's 27,13 29:18 1:01:25 32 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=18:03 (1m43s..1h38m)/5It (2stripe-Platte=150MB/s) 60%user+5%sys+35%idle(0%disk) 1user ---------------------------- 26,14 107h 212h 1 O2100-IP27-250MHz-CC-7.30 v1.4 25,15 6:21:08 22:13:42 4 ES45-Alpha-1GHz-CC-6.5 -fast -lz v2.17 lt=01:31:58 26,14 3:30:24 12:09:12 4 ES45-Alpha-1GHz-CC-6.5 -fast -lz v2.17 lt=01:02:21 26,14 45:51 1:45:48 16 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=27:07 (4m00s...4m12s)/5It mfs-disk vbuf=16M + spike (optimization after linking) 26,14 47:18 1:50:45 16 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=27:08 (3m48s...5m16s)/5It mfs-disk + spike (optimization after linking) 26,14 1:31:49 3:31:01 16 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=48:34 (7m57s..10m01s)/5It mfs-disk vbuf=16M 26,14 1:08:45 16:19:00 32 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=33:17 (30m..5h)/10It HBLen=409600 25,15 3:03:41 15:54:51 8 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=1h24m (1h18m..1h26m)/5It HBLen=409600 24,16 04:58:56 25:21:48 8 GS1280-Alpha-1150MHz-cxx-6.5 v2.19 -fast lt=2h08m (2h16m)/5It HBLen=409600 24,16 10:17:31 : : 4 ES45-Alpha-1GHz-CC-6.5 -fast -lz v2.17 lt=02:41:03 99%stored (+4h i40ca54h) 23,17 17:19:51 51:02:31 4 ES45-Alpha-1GHz-CC-6.5 -fast -lz v2.18 lt=04:11:14 latency=29h30m/40*63GB=42ns cat=2229s(28MB/s) zcat=5906s(11MB/s)
Next figure shows the computing time for different older program versions and computers (I update it as soon as I can). The computing time depends nearly linearly from the matrix size n1 (time is proportional to n1^1.07, n1 is named n in the figure).
Memory usage depends from the matrix dimension n1. For the N=40 sample two double vectors and one 5-byte vector is stored in the memory, so we need n1*21 Bytes, where n1 is approximatly (N!/(nu!*nd!))/(4N). Disk usage is mainly the number of nonzero matrix elements hnz times 5 (disk size for tmp_l1.dat is 5*n1 and is not included here). The number of nonzero matrix elements hnz depends from n1 by hnz=11.5(10)*n1^1.064(4), which was found empirically. Here are some examples:
nu,nd n1 memory hnz disk (zip) (n1*21=memory, hnz*5=disk) -----+---------------+---------------------- 34,6 24e3 432kB 526e3 2.6MB 1.3MB 32,8 482e3 11MB 13e6 66MB 34MB 30,10 5.3e6 113MB 168e6 840MB 444MB small speed test 28,12 35e6 735MB 1.2e9 6GB 3.6GB big speed test 27,13 75e6 1.4GB 2.8e9 14GB # n1=75214468 26,14 145e6 2.6GB 5.5e9 28GB 25,15 251e6 5.3GB 9.9e9 50GB 24,16 393e6 8.3GB 15.8e9 79GB 23,17 555e6 11.7GB 23e9 115GB 63GB 22,18 708e6 14.9GB ... ... 20,20 431e6 7.8GB 18e9 90GB
A typical cpu load for a N=40 site system looks like this:
Data are generated using the following tiny script:
#!/bin/sh
while ps -o pid,pcpu,time,etime,cpu,user,args -p 115877;\
do sleep 30; done | grep -v CPU
115877 is the PID of the process. You have to replace it.
Alternativly you can activate a script activated by daten.i (edit it).
The machine was used by 5 users, therefore peak load is only
about 12CPUs. 735MB memory and 6GB diskspace were used.
You can see the initialization process (20min),
the matrix generation (57min) and the first 4 iterations (4x8min).
The matrix generation is most dependend from CPU power.
The iteration time mainly depends from the disk speed (try: time cat exe/tmp/ht* >/dev/null) and the
speed of random memory access. For example a GS1280-1GHz needs a
bandwith to the disk of 60MB/s per CPU to avoid a bottle neck.
You can improve
disk speed using striped disks or files (AdvFS) and putting every
H-block on another disk. The maximum number
of threads was limited to 16, but this can be changed (see src/config.h).
Figure shows dataflow during iterations for 2 CPUs.