Performance

Preliminary performance test has been conducted on Nestum cluster environment in order to estimate the parallel performance. The theoretical peak performance Rmax was calculated as following Rmax = Number of nodes \times Number of cores per node \times AVX2 base frequency \times Number of DP operation per cycle = 24 \times 32 \times 1.9 \times 16 = 23347 Gflops. Standard LINPACK test from the HPL-2.2 package, performed with Intel OneaAPI 2022  and OpenMPI-1.10.3 and the following parameters:

...
615936 Ns
1 # of NBs
192 NBs
...
1 # of process grids (P x Q)
24 Ps
32 Qs
...

measured Rpeak = 18931 Gflops and parallel efficiency 80.8 %. These results place Nestum as the third fastest supercomputer in Bulgaria.