Description
Deep learning has achieved unprecedented performance in particle physics. In particular, the use of transformers has shown great promise for jet identification.
On the other hand, considering that jets originate from QCD dynamics, a transformer architecture designed to incorporate relevant physical inductive biases may reproduce the performance of general-purpose transformers with a significantly smaller number of model parameters.
From our research, we introduce several ideas along these lines, such as the use of cross-attention between subjets (jets) and constituent particles within a jet, as well as attention matrices restricted to information from pair wise variables.