Skip to main content
Journal cover image

The importance of residue-level filtering and the Top2018 best-parts dataset of high-quality protein residues.

Publication ,  Journal Article
Williams, CJ; Richardson, DC; Richardson, JS
Published in: Protein Sci
January 2022

We have curated a high-quality, "best-parts" reference dataset of about 3 million protein residues in about 15,000 PDB-format coordinate files, each containing only residues with good electron density support for a physically acceptable model conformation. The resulting prefiltered data typically contain the entire core of each chain, in quite long continuous fragments. Each reference file is a single protein chain, and the total set of files were selected for low redundancy, high resolution, good MolProbity score, and other chain-level criteria. Then each residue was critically tested for adequate local map quality to firmly support its conformation, which must also be free of serious clashes or covalent-geometry outliers. The resulting Top2018 prefiltered datasets have been released on the Zenodo online web service and are freely available for all uses under a Creative Commons license. Currently, one dataset is residue filtered on main chain plus Cβ atoms, and a second dataset is full-residue filtered; each is available at four different sequence-identity levels. Here, we illustrate both statistics and examples that show the beneficial consequences of residue-level filtering. That process is necessary because even the best of structures contain a few highly disordered local regions with poor density and low-confidence conformations that should not be included in reference data. Therefore, the open distribution of these very large, prefiltered reference datasets constitutes a notable advance for structural bioinformatics and the fields that depend upon it.

Duke Scholars

Published In

Protein Sci

DOI

EISSN

1469-896X

Publication Date

January 2022

Volume

31

Issue

1

Start / End Page

290 / 300

Location

United States

Related Subject Headings

  • Software
  • Proteins
  • Protein Conformation
  • Models, Molecular
  • Databases, Protein
  • Crystallography, X-Ray
  • Computational Biology
  • Biophysics
  • Algorithms
  • 3404 Medicinal and biomolecular chemistry
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Williams, C. J., Richardson, D. C., & Richardson, J. S. (2022). The importance of residue-level filtering and the Top2018 best-parts dataset of high-quality protein residues. Protein Sci, 31(1), 290–300. https://doi.org/10.1002/pro.4239
Williams, Christopher J., David C. Richardson, and Jane S. Richardson. “The importance of residue-level filtering and the Top2018 best-parts dataset of high-quality protein residues.Protein Sci 31, no. 1 (January 2022): 290–300. https://doi.org/10.1002/pro.4239.
Williams CJ, Richardson DC, Richardson JS. The importance of residue-level filtering and the Top2018 best-parts dataset of high-quality protein residues. Protein Sci. 2022 Jan;31(1):290–300.
Williams, Christopher J., et al. “The importance of residue-level filtering and the Top2018 best-parts dataset of high-quality protein residues.Protein Sci, vol. 31, no. 1, Jan. 2022, pp. 290–300. Pubmed, doi:10.1002/pro.4239.
Williams CJ, Richardson DC, Richardson JS. The importance of residue-level filtering and the Top2018 best-parts dataset of high-quality protein residues. Protein Sci. 2022 Jan;31(1):290–300.
Journal cover image

Published In

Protein Sci

DOI

EISSN

1469-896X

Publication Date

January 2022

Volume

31

Issue

1

Start / End Page

290 / 300

Location

United States

Related Subject Headings

  • Software
  • Proteins
  • Protein Conformation
  • Models, Molecular
  • Databases, Protein
  • Crystallography, X-Ray
  • Computational Biology
  • Biophysics
  • Algorithms
  • 3404 Medicinal and biomolecular chemistry