Error-resilient optimal data compression
The problem of communication and computation in the presence of errors is difficult, and general solutions can be time consuming and inflexible (particularly when implemented with a prescribed error detection/correction). A reasonable approach is to investigate reliable communication in carefully selected areas of fundamental interest where specific solutions may be more practical than general purpose techniques. In this paper, we study the problem of error-resilient communication and computation in a particularly challenging area, adaptive lossless data compression, where the devastating effect of error propagation is a long-standing open problem that was posed in the papers of Lempel and Ziv in the late 1970s. In fact, the non-error resilience of adaptive data compression has been a practical drawback of its use in many applications. Protocols that require the receiver to request retransmission from the sender when an error is detected can be impractical for many applications where such two-way communication is not possible or is self-defeating (e.g., with data compression, retransmission may be tantamount to losing the data that could have been transmitted in the mean time). In addition, bits of encoded data that are corrupted while data is in storage will in general not be recoverable and may corrupt the entire decompressed file. By error resilience, we mean that even though errors may not be detected, there are strong guarantees that their effects will not propagate. Our main result is a provable error-resilient adaptive lossless data-compression algorithm which nevertheless maintains optimal compression over the usual input distributions (e.g., stationary ergodic sources). We state our result in the context of a more general model that we call dynamic dictionary communication, where a sender and receiver work in a "lock-step" cooperation to maintain identical copies of a dictionary D that is constantly changing. For lossless data compression, the dictionary stores a set of strings that have been seen in the past and data is compressed by sending only indices of strings over the channel. Other applications of our model include robotics (e.g., remote terrain mapping) and computational learning theory.
Volume / Issue
Start / End Page
International Standard Serial Number (ISSN)
Digital Object Identifier (DOI)