Codes used within ETMC
tmLQCD linked with DD\(\alpha\)AMG and QPhiX for gauge generation
QUDA, DD\(\alpha\)AMG for propagator generation linked to several packages (paramgen, plegma, Nissa, …)
contraction tools: cvc, plegma and many more
Codes used within ETMC
tmLQCD linked with DD\(\alpha\)AMG and QPhiX for gauge generation
QUDA, DD\(\alpha\)AMG for propagator generation linked to several packages (paramgen, plegma, Nissa, …)
contraction tools: cvc, plegma and many more
Currently, we have
GPU based machines (mostly clusters and PizDaint)
CPU based machines (SuperMUC, JUWELS, …)
in the future
likely large GPU installation in Jülich
a new, likely ARM based, European architecture
complicated vector pipeline and many threads
peak performance of machines increasing further and further
however, code efficiency is decreasing if no significant amount of programming and optimisation is invested
and optmisations need to be done again and again
one would like to have a general enough code to easily adopt to new architectures
If we don't adopt, we risk to loose
the ability to generate state of the art gauge configs
our ability for innovations
our competitiveness
tmLQCD becomes more and more difficult to adopt to new realities
it would require a completely new memory management
and a completely new data layout
likely not possible with tmLQCD, as it is a C-code
tmLQCD is the only package supporting twisted 2+1+1 dynamically
propagator generation
on the fly contractions
reasonably optimised kernels for general architectures for all these tasks
else?
start a completely new code suite
extend Plegma or tmLQCD
else?
Any of these requires a dedicated effort and a lot of manpower!
my opinion: need a good compromise between efficiency, flexibility and effort!