Hypergraph-Transformer (HGT) for Interaction Event Prediction in Laparoscopic and Robotic Surgery
Understanding and anticipating events and actions is critical for intraoperative assistance and decision-making during minimally invasive surgery. We propose a predictive neural network that is capable of understanding and predicting critical interaction aspects of surgical workflow based on endoscopic, intracorporeal video data, while flexibly leveraging surgical knowledge graphs. The approach incorporates a hypergraph-transformer (HGT) structure that encodes expert knowledge into the network design and predicts the hidden embedding of the graph. We verify our approach on established surgical datasets and applications, including the prediction of action-triplets, and the achievement of the Critical View of Safety (CVS), which is a critical safety measure. Moreover, we address specific, safety-related forecasts of surgical processes, such as predicting the clipping of the cystic duct or artery without prior achievement of the CVS. Our results demonstrate improvement in prediction of interactive event when incorporating with our approach compared to unstructured alternatives.