Scholars@Duke publication: A modular CDF approach for the approximation of percentiles

A modular CDF approach for the approximation of percentiles

Publication , Journal Article

Choudhury, KR; Tabirca, S

Published in: Communications in Statistics: Simulation and Computation

November 1, 2008

This article describes a method for computing approximate statistics for large data sets, when exact computations may not be feasible. Such situations arise in applications such as climatology, data mining, and information retrieval (search engines). The key to our approach is a modular approximation to the cumulative distribution function (cdf) of the data. Approximate percentiles (as well as many other statistics) can be computed from this approximate cdf. This enables the reduction of a potentially overwhelming computational exercise into smaller, manageable modules. We illustrate the properties of this algorithm using a simulated data set. We also examine the approximation characteristics of the approximate percentiles, using a von Mises functional type approach. In particular, it is shown that the maximum error between the approximate cdf and the actual cdf of the data is never more than 1% (or any other preset level). We also show that under assumptions of underlying smoothness of the cdf, the approximation error is much lower in an expected sense. Finally, we derive bounds for the approximation error of the percentiles themselves. Simulation experiments show that these bounds can be quite tight in certain circumstances.

Duke Scholars

Author Kingshuk Roy Choudhury Biostatistics & Bioinformatics, Division of Biostatistics

Published In

Communications in Statistics: Simulation and Computation

DOI

10.1080/03610910802296356

EISSN

1532-4141

ISSN

0361-0918

Publication Date

November 1, 2008

Volume

Issue

Start / End Page

1948 / 1965

Related Subject Headings

Statistics & Probability
08 Information and Computing Sciences
01 Mathematical Sciences

Citation

APA

Chicago

ICMJE

MLA

NLM

Choudhury, K. R., & Tabirca, S. (2008). A modular CDF approach for the approximation of percentiles. Communications in Statistics: Simulation and Computation, 37(10), 1948–1965. https://doi.org/10.1080/03610910802296356

Choudhury, K. R., and S. Tabirca. “A modular CDF approach for the approximation of percentiles.” Communications in Statistics: Simulation and Computation 37, no. 10 (November 1, 2008): 1948–65. https://doi.org/10.1080/03610910802296356.

Choudhury KR, Tabirca S. A modular CDF approach for the approximation of percentiles. Communications in Statistics: Simulation and Computation. 2008 Nov 1;37(10):1948–65.

Choudhury, K. R., and S. Tabirca. “A modular CDF approach for the approximation of percentiles.” Communications in Statistics: Simulation and Computation, vol. 37, no. 10, Nov. 2008, pp. 1948–65. Scopus, doi:10.1080/03610910802296356.

Choudhury KR, Tabirca S. A modular CDF approach for the approximation of percentiles. Communications in Statistics: Simulation and Computation. 2008 Nov 1;37(10):1948–1965.

Published In

Communications in Statistics: Simulation and Computation

DOI

10.1080/03610910802296356

EISSN

1532-4141

ISSN

0361-0918

Publication Date

November 1, 2008

Volume

Issue

Start / End Page

1948 / 1965

Related Subject Headings

Statistics & Probability
08 Information and Computing Sciences
01 Mathematical Sciences