DistDNAS: Search Efficient Feature Interactions within 2 Hours
Search efficiency and serving efficiency are two major axes in building feature interactions and expediting the model development process in recommender systems. Searching for the optimal feature interaction design on large-scale benchmarks requires extensive cost due to the sequential workflow on the large volume of data. In addition, fusing interactions of various sources, orders, and mathematical operations introduces potential conflicts and additional redundancy toward recommender models, leading to sub-optimal trade-offs in performance and serving cost. This paper presents DistDNAS as a neat solution to brew swift and efficient feature interaction design. DistDNAS proposes a supernet incorporating interaction modules of varying orders and types as a search space. To optimize search efficiency, DistDNAS distributes the search and aggregates the choice of optimal interaction modules on varying data dates, achieving a speed-up of over 25× and reducing the search cost from 2 days to 2 hours. To optimize serving efficiency, DistDNAS introduces a differentiable cost-aware loss to penalize the selection of redundant interaction modules, enhancing the efficiency of discovered feature interactions in serving. We extensively evaluate the best models crafted by DistDNAS on a 1TB Criteo Terabyte dataset. Experimental evaluations demonstrate 0.001 AUC improvement and 60% FLOPs saving over current state-of-the-art CTR models.