Conveners
Software development and Machines: Software development and Machines I
- Mathias Wagner
Software development and Machines: Software development and Machines II
- Bartosz Kostrzewa (Univ. of Bonn, High Performance Computing & Analytics Lab)
Software development and Machines: Software development and Machines III
- Jacob Finkenrath
COLA is a software library for lattice QCD written in modern Fortran and NVIDIA CUDA. Intel and NVIDIA have dominated the HPC domain for a long time, but the status quo has been changed with the recent advent of AMD-based systems in the supercomputing Top`500. Setonix is a next generation Cray AMD machine currently being installed at the Pawsey Supercomputing Centre in Perth, Australia....
In this talk I give an update on the status of the GPT software package. (https://github.com/lehner/gpt.)
Lyncs-API is a Python API for Lattice QCD. It aims to create a complete framework for easily running applications via Python. It implements low- and high-level tools, including interface to common LQCD libraries. Last year, at this conference, we presented the API to the community for the first time. In this talk we will give a status update on its development and show the potential of the API...
We present progress in interfacing the Hybrid Monte Carlo implementation in the tmLQCD software suite with the QUDA library and compare its performance to our top of the line algorithms on CPU machines. We discuss the main challenges and overheads of our approach and scrutinize its fundamental architectural limitations before exploring ongoing improvements as well as current and future simulations.
In this talk we present work on extending the set of solvers for the inversion of the Dirac matrix for Wilson-Clover type fermions in Grid. Particular emphasis is put on the inexact deflation method put forward by Lüscher. Besides providing fast solves for configurations at the physical point one of the method’s central advantages is that it can be included into the HMC algorithm at relatively...
Bandwidth and latencies are central performance limiters for Lattice QCD. To overcome bandwidth limiters one way is to reduce the number of bits need by e.g., mixed precision solvers. These provide great speedups but increase the relative importance of latency limiters. We discuss techniques that QUDA uses to reduce latencies from GPU-CPU and GPU-network transfers and their impact for...
MPI Job Manager (MPI_JM) is "scheduler" designed enable users to make maximum use of heterogenous architectures, particularly which require a "swarm" of independent MPI tasks is required for a complete calculation - such as lattice QCD calculations of correlation functions on pre-existing configurations. MPI_JM managers all these tasks through lightweight C++ code supported by Python3. ...
Implementations of measurement kernels in high-level Lattice QCD frameworks enable rapid prototyping, but can leave hardware capabilities significantly underutilized. This is an acceptable tradeoff if the time spent in unoptimized routines is generally small. The computational cost of modern spectroscopy projects however can be comparable to or even exceed the cost of generating gauge...
Adaptive multigrid methods have proven very successful in dealing with critical slow down for the Wilson-Dirac solver in lattice gauge theory. New formulations for Multigrid methods with staggered fermions are currently being tested on pre-exascale GPU supercomputers such as Summit and Crusher. In this talk, I will discuss our implementation of staggered multigrid codes on the Summit...
Lyncs-API is a Python API for lattice QCD. One of the goals of lyncs-API is to provide a common framework for lattice QCD calculation for different HPC architectures with and without accelerators by utilizing different software packages. As such, it contains interfaces to c-lime, DDalphaAMG, tmLQCD, and QUDA. In this talk, we focus on the interface to QUDA, named lyncs-QUDA, and present a...
We give an overview of the mixed-precision Krylov strategies of QUDA. These have evolved over the past decade and utilize a variety of numerical techniques to stabilize the convergence of solvers such as Conjugate Gradient. We describe a recently developed bit packing technique to increase precision at fixed word size. This improvement in precision stabilizes the mixed-precision solvers as the...
As a fully computational discipline, Lattice Field Theory has the potential to give results that anyone with sufficient computational resources can reproduce, going from input parameters to published numbers and plots correct to the last byte. After briefly motivating and outlining some of the key steps in making lattice computations reproducible, I will present the results of a survey of all...
Open science aims to make scientific research processes, tools and results accessible to all scientific communities, creating trust in science and enabling digital competences to be realized in research, leading to increased innovation. It provides standard and transparent pathways to conducting research and fosters best practices for collecting, analysing, preserving, sharing and reusing...