Denoising pediatric cardiac photon-counting CT data using volumetric vision transformers and unpaired training data
The advent of clinical photon counting x-ray CT (PCCT) has yielded gains in spatial resolution, noise performance, contrast resolution, and radiation dose management in numerous diagnostic applications. Pediatric populations, especially, stand to benefit from these gains due to their smaller size relative to adult patients and to greater concerns over lifetime cancer risk associated with ionizing radiation exposure. No where is this more applicable than in the congenital heart population who are repeatedly imaged during their care. Despite these potential advantages, limited protocol optimizations for pediatric patients, the heterogeneity of metallic implants in pediatric cardiac patients, and trade-offs between image quality and dose made by lowering kV and mAs values contribute to variable image quality in pediatric PCCT. Here, we adapt and demonstrate two deep-learning strategies for denoising pediatric cardiac PCCT data which do not require paired training data and which can be trained using data sets of varying image quality. (1) We train a CycleGAN to map CT images between high and low image quality, as gauged by signal-to-noise ratio measurements. The domain mappings are performed with vision transformers (ViTs) which adeptly preserve high spatial frequencies and enable 50% noise reduction across high and low intensity structures (enhanced vasculature, fat). (2) We adapt prior work using masked autoencoders and 2D natural images to the domain of 3D CT data. Specifically, we propose a ViT network structure and cost function which yields robust interpolation of deleted 3D CT data, similarly reducing image noise by 50% in an assessment of validation data. Comparing CycleGAN and masked autoencoder results, the CycleGAN reduces noise by identifying and attenuating high frequency features associated with noise while the masked autoencoder adeptly removes noise textures but visibly smooths high-contrast structures.