Performance

Preliminary performance test has been conducted on Nestum cluster environment in order to estimate the parallel performance. The theoretical peak performance Rmax was calculated as following Rmax = Number of nodes \times Number of cores per node \times AVX2 base frequency \times Number of DP operation per cycle = 24 \times 32 \times 1.9 \times 16 = 23347 Gflops. Standard LINPACK test from the HPL-2.2 package, performed with Intel Compiler XE 2017 developer edition and OpenMPI-1.10.3 and the following parameters:

...
615936  Ns
1            # of NBs
192          NBs
...
1            # of process grids (P x Q)
24            Ps
32           Qs
...

measured Rpeak = 19001 Gflops and parallel efficiency 81.4 %. These results place Nestum as the second fastest supercomputer in Bulgaria.