Towards automated network management: Learning the optimal protocol selection
Today's Internet must support applications with increasingly dynamic and heterogeneous connectivity requirements, such as video streaming and the Internet of Things. Yet current network management practices generally rely on pre-specified flow configurations, which cannot cover all possible scenarios. In this work, we instead propose a model-free learning approach to automatically optimize the policies for heterogeneous network flows. This approach is attractive as no existing comprehensive models quantify how different policy choices affect flow performance under dynamically changing network conditions. We extend multi-armed bandit frameworks to propose new online learning algorithms for protocol selection, addressing the challenge of policy configurations affecting the performance of multIPle flows sharing the same network resources. This performance coupling limits the scalability and optimality of existing online learning algorithms. We theoretically prove that our algorithm achieves a sublinear regret and demonstrate its optimality and scalability through data-driven simulations.