Preserving Accuracy While Stealing Watermarked Deep Neural Networks
The deployment of Deep Neural Networks (DNNs) as cloud services has accelerated significantly over the years. Training an application-specific DNN for cloud deployment requires substantial computational resources and costs associated with hyper-parameter tuning and model selection. To preserve Intellectual Property (IP) rights, model owners embed watermarks into publicly deployed DNNs. These trigger inputs and labels are uniquely selected and embedded into the watermarked DNN by the model owner, remaining undisclosed during deployment. If a watermarked DNN (target classifier) is stolen via white-box access and re-deployed by an adversary (pirated classifier) without securing the IP rights from the model owner, the model owner can identify their IP by sending trigger inputs to retrieve trigger labels. Typically, adversaries tamper with the model weights of the target classifier prior to deployment, which in turn reduces the utility of the well-trained DNN. The authors proposes re-deploying the target classifier without altering the model weights to preserve model utility, and using a small sample of non-identical in-distribution inputs (used for training the target classifier) to train a Siamese neural network to evade detection, at inference stage. Experimental evaluations on standard benchmark datasets- MNIST, Fashion-MNIST, CIFAR-10, and CIFAR-100- using ResNet architectures with varying triggers demonstrate that the proposed method achieves zero false positive rate (fraction of clean testing input incorrectly labelled as trigger inputs) and false negative rate (fraction of trigger inputs incorrectly labelled as clean in-distribution inputs) in nearly all cases, proving its efficacy.