Skip to main content

Reducing noise in labels and features for a real world dataset: Application of NLP corpus annotation methods

Publication ,  Conference
Passonneau, RJ; Rudin, C; Radeva, A; Liu, ZA
Published in: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
July 21, 2009

This paper illustrates how a combination of information extraction, machine learning, and NLP corpus annotation practice was applied to a problem of ranking vulnerability of structures (service boxes, manholes) in the Manhattan electrical grid. By adapting NLP corpus annotation methods to the task of knowledge transfer from domain experts, we compensated for the lack of operational definitions of components of the model, such as serious event. The machine learning depended on the ticket classes, but it was not the end goal. Rather, our rule-based document classification determines both the labels of examples and their feature representations. Changes in our classification of events led to improvements in our model, as reflected in the AUC scores for the full ranked list of over 51K structures. The improvements for the very top of the ranked list, which is of most importance for prioritizing work on the electrical grid, affected one in every four or five structures. © Springer-Verlag Berlin Heidelberg 2009.

Duke Scholars

Published In

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

DOI

EISSN

1611-3349

ISSN

0302-9743

Publication Date

July 21, 2009

Volume

5449 LNCS

Start / End Page

86 / 97

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Passonneau, R. J., Rudin, C., Radeva, A., & Liu, Z. A. (2009). Reducing noise in labels and features for a real world dataset: Application of NLP corpus annotation methods. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 5449 LNCS, pp. 86–97). https://doi.org/10.1007/978-3-642-00382-0_7
Passonneau, R. J., C. Rudin, A. Radeva, and Z. A. Liu. “Reducing noise in labels and features for a real world dataset: Application of NLP corpus annotation methods.” In Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 5449 LNCS:86–97, 2009. https://doi.org/10.1007/978-3-642-00382-0_7.
Passonneau RJ, Rudin C, Radeva A, Liu ZA. Reducing noise in labels and features for a real world dataset: Application of NLP corpus annotation methods. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2009. p. 86–97.
Passonneau, R. J., et al. “Reducing noise in labels and features for a real world dataset: Application of NLP corpus annotation methods.” Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 5449 LNCS, 2009, pp. 86–97. Scopus, doi:10.1007/978-3-642-00382-0_7.
Passonneau RJ, Rudin C, Radeva A, Liu ZA. Reducing noise in labels and features for a real world dataset: Application of NLP corpus annotation methods. Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics). 2009. p. 86–97.

Published In

Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)

DOI

EISSN

1611-3349

ISSN

0302-9743

Publication Date

July 21, 2009

Volume

5449 LNCS

Start / End Page

86 / 97

Related Subject Headings

  • Artificial Intelligence & Image Processing
  • 46 Information and computing sciences