Filtering, Reductions and Synchronization in the Anton 2 Network
Parallel implementations of molecular dynamics (MD) simulation require significant inter-node communication, but off-chip communication bandwidth is not scaling as quickly as on-chip logic density. We present three network features targeting this problem that have been implemented in Anton 2, a massively parallel special-purpose supercomputer for MD simulations. The first is a mechanism to dynamically identify packets that do not need to be delivered to all endpoints within a multicast tree, these packets are filtered to conserve network bandwidth. The second is hardware for in-network reductions that supports over a thousand concurrent neighbourhood reductions per node and fast all-to-all global reductions. The third is a log-weight synchronization mechanism for multicast-reduce communication patterns that can be used to efficiently detect the completion of reduction operations when the number of summands is difficult to predict. We use the combination of packet filtering, in-network reductions and log-weight synchronization to decrease the communication requirements of MD simulations by as much as 51% on Anton 2, yielding application-level performance improvements of up to 14%.