Scholars@Duke publication: How Well Do Large Language Models Understand Tables in Materials Science?

How Well Do Large Language Models Understand Tables in Materials Science?

Publication , Journal Article

Circi, D; Khalighinejad, G; Chen, A; Dhingra, B; Brinson, LC

Published in: Integrating Materials and Manufacturing Innovation

September 1, 2024

Advances in materials science require leveraging past findings and data from the vast published literature. While some materials data repositories are being built, they typically rely on newly created data in narrow domains because extracting detailed data and metadata from the enormous wealth of publications is immensely challenging. The advent of large language models (LLMs) presents a new opportunity to rapidly and accurately extract data and insights from the published literature and transform it into structured data formats for easy query and reuse. In this paper, we build on initial strategies for using LLMs for rapid and autonomous data extraction from materials science articles in a format curatable by materials databases. We presented the subdomain of polymer composites as our example use case and demonstrated the success and challenges of LLMs on extracting tabular data. We explored different table representations for use with LLMs, finding that a multimodal model with an image input yielded the most promising results. This model achieved an accuracy score of 0.910 for composition information extraction and an F1 score of 0.863 for property name information extraction. With the most conservative evaluation for the property extraction requiring exact match in all the details, we obtained an F1 score of 0.419. We observed that by allowing varying degrees of flexibility in the evaluation, the score can increase to 0.769. We envision that the results and analysis from this study will promote further research directions in developing information extraction strategies from materials information sources.

Duke Scholars

Author Defne Çirci

Author Bhuwan Dhingra Computer Science

Author L. Catherine Brinson Thomas Lord Department of Mechanical Engineering and Materia ...

Published In

Integrating Materials and Manufacturing Innovation

DOI

10.1007/s40192-024-00362-6

EISSN

2193-9772

ISSN

2193-9764

Publication Date

September 1, 2024

Volume

Issue

Start / End Page

669 / 687

Related Subject Headings

4016 Materials engineering

Citation

APA

Chicago

ICMJE

MLA

NLM

Circi, D., Khalighinejad, G., Chen, A., Dhingra, B., & Brinson, L. C. (2024). How Well Do Large Language Models Understand Tables in Materials Science? Integrating Materials and Manufacturing Innovation, 13(3), 669–687. https://doi.org/10.1007/s40192-024-00362-6

Circi, D., G. Khalighinejad, A. Chen, B. Dhingra, and L. C. Brinson. “How Well Do Large Language Models Understand Tables in Materials Science?” Integrating Materials and Manufacturing Innovation 13, no. 3 (September 1, 2024): 669–87. https://doi.org/10.1007/s40192-024-00362-6.

Circi D, Khalighinejad G, Chen A, Dhingra B, Brinson LC. How Well Do Large Language Models Understand Tables in Materials Science? Integrating Materials and Manufacturing Innovation. 2024 Sep 1;13(3):669–87.

Circi, D., et al. “How Well Do Large Language Models Understand Tables in Materials Science?” Integrating Materials and Manufacturing Innovation, vol. 13, no. 3, Sept. 2024, pp. 669–87. Scopus, doi:10.1007/s40192-024-00362-6.

Published In

Integrating Materials and Manufacturing Innovation

DOI

10.1007/s40192-024-00362-6

EISSN

2193-9772

ISSN

2193-9764

Publication Date

September 1, 2024

Volume

Issue

Start / End Page

669 / 687

Related Subject Headings

4016 Materials engineering