Scholars@Duke publication: Optimal encoding of non-stationary sources

Optimal encoding of non-stationary sources

Publication , Journal Article

Reif, JH; Storer, JA

Published in: Information Sciences

June 1, 2001

The usual assumption for proofs of the optimality of lossless encoding is a stationary ergodic source. Dynamic sources with non-stationary probability distributions occur in many practical situations where the data source is formed from a composition of distinct sources, for example, a document with multiple authors, a multimedia document, or the composition of distinct packets sent over a communication channel. There is a vast literature of adaptive methods used to tailor the compression to dynamic sources. However, little is known about optimal or near optimal methods for lossless compression of strings generated by sources that are not stationary ergodic. Here, we do not assume the source is stationary. Instead, we assume that the source produces an infinite sequence of concatenated finite strings s 1...s n, where: (i) Each finite string s i is generated by a sampling of a (possibly distinct) stationary ergodic source S i, and (ii) the length of each of the s i is lower bounded by a function L(n) such that L(n)/log(n) grows unboundedly with the length n of all the text within s 1...s i. Thus each input string is a sequence of substrings generated by possibly distinct and unknown stationary ergodic sources. The optimal expected length of a compressed coding of a finite prefix s 1...s k is∑ i=1^kn iH i, where n i is the length of s i and H i is the entropy of S i. We give a window-based LZ77-type method for compression that we prove gives an encoding with asymptotically optimal expected length. We give another LZ77-type method for compression where the expected time for encoding and decoding is nearly linear (approaching arbitrarily close to linear O(n) for large n). We also prove that this later method gives an encoding with asymptotically optimal expected length. In addition, give a dictionary-based LZ78-type method for compression, which takes linear time with small constant factors. This final algorithm also gives an encoding with asymptotically optimal expected length, assuming the S i are stationary ergodic sources that satisfy certain mixing conditions and L(n) ≥ n ^ε for some ε > 0. © 2001 Elsevier Science Inc.

Duke Scholars

Author John H. Reif Computer Science

Published In

Information Sciences

DOI

10.1016/S0020-0255(01)00103-7

ISSN

0020-0255

Publication Date

June 1, 2001

Volume

135

Issue

1-2

Start / End Page

87 / 105

Related Subject Headings

Artificial Intelligence & Image Processing
49 Mathematical sciences
46 Information and computing sciences
40 Engineering
09 Engineering
08 Information and Computing Sciences
01 Mathematical Sciences

Citation

APA

Chicago

ICMJE

MLA

NLM

Reif, J. H., & Storer, J. A. (2001). Optimal encoding of non-stationary sources. Information Sciences, 135(1–2), 87–105. https://doi.org/10.1016/S0020-0255(01)00103-7

Reif, J. H., and J. A. Storer. “Optimal encoding of non-stationary sources.” Information Sciences 135, no. 1–2 (June 1, 2001): 87–105. https://doi.org/10.1016/S0020-0255(01)00103-7.

Reif JH, Storer JA. Optimal encoding of non-stationary sources. Information Sciences. 2001 Jun 1;135(1–2):87–105.

Reif, J. H., and J. A. Storer. “Optimal encoding of non-stationary sources.” Information Sciences, vol. 135, no. 1–2, June 2001, pp. 87–105. Scopus, doi:10.1016/S0020-0255(01)00103-7.

Reif JH, Storer JA. Optimal encoding of non-stationary sources. Information Sciences. 2001 Jun 1;135(1–2):87–105.

Published In

Information Sciences

DOI

10.1016/S0020-0255(01)00103-7

ISSN

0020-0255

Publication Date

June 1, 2001

Volume

135

Issue

1-2

Start / End Page

87 / 105

Related Subject Headings

Artificial Intelligence & Image Processing
49 Mathematical sciences
46 Information and computing sciences
40 Engineering
09 Engineering
08 Information and Computing Sciences
01 Mathematical Sciences