HPC for Scientists: A Guide to High-Performance Computing Infrastructure

Updated June 2026
High-performance computing (HPC) refers to the use of supercomputers and computing clusters to solve scientific problems that exceed the capacity of standard desktop machines. HPC systems provide the raw computational power, high-speed networks, and massive storage that scientists need to run simulations at scales that reveal new physical insights. Understanding HPC infrastructure helps researchers make effective use of these shared resources and write code that performs well on parallel architectures.

Supercomputer Architecture

A modern supercomputer is a collection of thousands of compute nodes connected by a high-speed network. Each node is essentially a powerful server containing multi-core CPUs, GPU accelerators, and local memory. The nodes are housed in rows of cabinets in a data center with industrial cooling systems, dedicated power supplies, and high-speed interconnects.

As of 2026, the most powerful supercomputers have crossed the exascale barrier, achieving performance above one exaflop (10 to the 18th floating-point operations per second). Systems like Frontier at Oak Ridge National Laboratory and Aurora at Argonne National Laboratory combine tens of thousands of nodes, each containing next-generation CPUs and GPUs, connected by networks capable of moving petabytes of data per second. These machines consume 20 to 40 megawatts of electrical power, comparable to a small town.

The compute nodes in modern HPC systems are heterogeneous, combining general-purpose CPUs with specialized accelerators. NVIDIA A100 and H100 GPUs, AMD MI250X and MI300 accelerators, and Intel Data Center GPU Max series are common accelerator choices. Each accelerator provides much higher arithmetic throughput than the CPU but requires data to be transferred to its dedicated memory and programs to be written using specialized frameworks.

The interconnect network connects nodes and determines how fast they can exchange data. Slingshot, InfiniBand HDR and NDR, and proprietary networks from Cray/HPE provide bandwidths of hundreds of gigabits per second per node with microsecond-scale latencies. Network topology, whether fat-tree, dragonfly, or torus, affects the communication performance of different parallel algorithms.

Job Scheduling and Resource Management

HPC systems are shared resources used by many researchers simultaneously. Job schedulers like Slurm, PBS Pro, and LSF manage the allocation of nodes to users. Researchers submit job scripts that specify the number of nodes required, the expected runtime, and the commands to execute. The scheduler queues these jobs and dispatches them to available nodes when sufficient resources become free.

Understanding job scheduling is essential for efficient use of HPC resources. Backfill scheduling allows smaller jobs to run in gaps left by large jobs waiting for enough nodes, so requesting only the resources actually needed (rather than overestimating) can dramatically reduce queue wait times. Job arrays simplify submitting many similar jobs, such as parameter sweeps or Monte Carlo ensembles. Interactive sessions allow debugging and testing but should be used sparingly on busy systems.

Most HPC centers use an allocation system where research groups receive computing time measured in core-hours or node-hours. A job using 100 nodes for 10 hours consumes 1,000 node-hours. Researchers must budget their allocations and optimize their codes to use resources efficiently. Poor parallel scalability or wasteful memory usage directly translates to fewer results for the same allocation.

Storage and Data Management

HPC systems provide multiple tiers of storage, each with different characteristics. Parallel file systems like Lustre, GPFS (Spectrum Scale), and BeeGFS provide high-bandwidth shared storage accessible from all compute nodes. These file systems stripe data across hundreds of storage servers to achieve aggregate bandwidths of terabytes per second, enabling simulations that generate large outputs to write data without becoming I/O bottlenecked.

Scratch storage provides temporary high-performance space for active simulations. Files on scratch are typically purged after a period of inactivity (commonly 30 to 90 days). Researchers must transfer important results to more permanent storage before they are deleted. Home directories provide smaller, backed-up storage for source code, scripts, and configuration files. Archive storage, often tape-based, provides long-term storage for large datasets at lower cost.

Data management is a significant challenge at HPC scale. A single climate simulation can produce petabytes of output. Storing, organizing, and transferring this data requires planning. Tools like Globus facilitate large-scale data transfers between institutions. Data compression, selective output (writing only the variables and time steps needed for analysis), and in-situ visualization (analyzing data while the simulation is running, without writing it to disk) are strategies for managing data volume.

Performance Optimization

Getting good performance on HPC systems requires attention to several levels of optimization. Single-core performance depends on using the CPU cache effectively, avoiding unnecessary memory accesses, and enabling compiler optimizations. The gap between peak and achievable performance is often large because scientific codes are frequently limited by memory bandwidth rather than arithmetic throughput.

Node-level optimization involves using all cores and accelerators within each node effectively. This means parallelizing with OpenMP or CUDA and managing data transfers between CPU and GPU memory. Profiling tools like Intel VTune, NVIDIA Nsight, AMD uProf, and TAU help identify bottlenecks by measuring where the program spends its time and where performance is being lost.

Scaling optimization ensures that performance improves proportionally as more nodes are added. Communication overhead, load imbalance, and synchronization costs all reduce parallel efficiency. Strong scaling (fixed problem size, increasing processors) reveals how much of the computation is parallelizable. Weak scaling (problem size grows with processor count) reveals how well the algorithm handles increasing data volume.

Software Environment

HPC systems use the Linux operating system and provide a rich software environment through module systems (Environment Modules or Lmod). Researchers load specific versions of compilers, MPI implementations, and scientific libraries using module commands, allowing multiple versions to coexist without conflicts.

Common HPC software includes compilers (GCC, Intel oneAPI, NVIDIA HPC SDK, AMD AOCC), MPI implementations (OpenMPI, MPICH, Intel MPI), numerical libraries (BLAS, LAPACK, ScaLAPACK, FFTW, PETSc), and domain-specific applications. Container technologies (Singularity/Apptainer) package software environments for portability and reproducibility, allowing researchers to run the same software stack across different HPC systems.

HPC centers typically provide user support, training workshops, and documentation. Taking advantage of these resources, particularly the center-specific documentation about hardware configuration, recommended compiler flags, and best practices for the particular job scheduler, can save significant time and improve computational efficiency.

Key Takeaway

Effective use of HPC infrastructure requires understanding not just the parallel programming models but also the job scheduling system, storage hierarchy, and performance characteristics of the specific hardware, because the gap between theoretical peak performance and actual achieved performance can be enormous.