The Untapped Potential of Off-the-Shelf Convolutional Neural Networks
Over recent years, a myriad of novel convolutional network architectures have been developed to advance state-of-the-art performance on challenging recognition tasks. As computational resources improve, a great deal of effort has been placed on efficiently scaling up existing designs and generating new architectures with Neural Architecture Search (NAS) algorithms. While network topology has proven to be a critical factor for model performance, we show that significant gains are being left on the table by keeping topology static at inference-time. Due to challenges such as scale variation, we should not expect static models configured to perform well across a training dataset to be optimally configured to handle all test data. In this work, we expose the exciting potential of inference-time-dynamic models. We show that by allowing just four layers to dynamically change configuration at inference-time, off-the-shelf models like ResNet-50 have an upper bound accuracy of over 95% on ImageNet. This level of performance currently exceeds that of models with over 20x more parameters and significantly more complex training procedures. While this upper bound of performance may be practically difficult to achieve for a real dynamic model, it indicates a significant source of untapped potential for current models.