The computing cluster at FAI received another significant update as part of grant (АР14871603, АР19676713, AP14869395, AP14870501) and program-targeted funding (BR21881880, BR20381077). The following were purchased: additional server racks SHIP 601S.6842.54.100, expansion modules Synology RX1217/RX1217RP, battery modules Eaton EBM 9SXEBM180RT, computing server Supermicro GPU A+ Server AS-4125GS-TNRT2 Server with 2x 96-cores processors AMD Epyc 9654 с 768GB of DDR5 RAM, videocards GIGABYTE GeForce RTX 4090 WINDFORCE V2 24G, automatic transfer switch Eaton ATS16N, as well as additional network cards for service servers. Additional network and power cables were also purchased. The cluster equipment was redistributed across 5 racks into a more efficient configuration.
As a result, the computing power of the cluster (FP32) on the CPU increased by 21 teraflops (by 21%) and by 164 teraflops (by 6%) on the GPU, and in total amounted to 94.4 teraflops on the CPU and 1461 teraflops on the GPU.
The actual cluster storage capacity was expanded by 48 terabytes for a total of 280 terabytes. At the same time, an expansion potential of another 338 terabytes was created (using disks with a nominal capacity of 18 terabytes). It should be noted that these indicators were achieved twice, since we are talking about expanding the capacity of the SHA cluster consisting of two identical NAS servers (Synology RS4021xs+).
The purchase of additional battery modules made it possible to increase the battery life of all cluster systems from 2-3 minutes to 8-10 minutes, and critical systems to 10-11 hours.
The cable management of the cluster has also been significantly improved, and color differentiation has been organized. Category 5e network cables were replaced with category 6 cables with shielding and LZHS sheath, which made it possible to provide a stable 10 Gigabit link with all servers. Also, all servers were connected by an additional (second) network cable for greater fault tolerance and organization of channel aggregation (LAG with LACP), which made it possible to increase the speed of shared access to servers up to 20 gigabits (important when using MPI parallelization on two or more hosts).
The main purpose of the cluster is to conduct computational studies of astrophysical and cosmological objects, such as star clusters, galaxies, galaxy clusters and the large-scale structure of the Universe, as well as processing and reliable long-term storage of the obtained computational and observational data.
The computing cluster is FAI’s own initiative and is being developed in an additive manner using basic, grant and program-targeted funding.