Privacy-preserving genomic computation through program specialization
In this paper, we present a new approach to performing important classes of genomic computations (e.g., search for homologous genes) that makes a significant step towards privacy protection in this domain. Our approach leverages a key property of the human genome, namely that the vast majority of it is shared across humans (and hence public), and consequently relatively little of it is sensitive. Based on this observation, we propose a privacy-protection framework that partitions a genomic computation, distributing the part on sensitive data to the data provider and the part on the pubic data to the user of the data. Such a partition is achieved through program specialization that enables a biocomputing program to perform a concrete execution on public data and a symbolic execution on sensitive data. As a result, the program is simplified into an efficient query program that takes only sensitive genetic data as inputs. We prove the effectiveness of our techniques on a set of dynamic programming algorithms common in genomic computing. We develop a program transformation tool that automatically instruments a legacy program for specialization operations. We also demonstrate that our techniques can greatly facilitate secure multi-party computations on large biocomputing problems. Copyright 2009 ACM.