Skip to main content
Journal cover image

How Well Do Large Language Models Understand Tables in Materials Science?

Publication ,  Journal Article
Circi, D; Khalighinejad, G; Chen, A; Dhingra, B; Brinson, LC
Published in: Integrating Materials and Manufacturing Innovation
September 1, 2024

Advances in materials science require leveraging past findings and data from the vast published literature. While some materials data repositories are being built, they typically rely on newly created data in narrow domains because extracting detailed data and metadata from the enormous wealth of publications is immensely challenging. The advent of large language models (LLMs) presents a new opportunity to rapidly and accurately extract data and insights from the published literature and transform it into structured data formats for easy query and reuse. In this paper, we build on initial strategies for using LLMs for rapid and autonomous data extraction from materials science articles in a format curatable by materials databases. We presented the subdomain of polymer composites as our example use case and demonstrated the success and challenges of LLMs on extracting tabular data. We explored different table representations for use with LLMs, finding that a multimodal model with an image input yielded the most promising results. This model achieved an accuracy score of 0.910 for composition information extraction and an F1 score of 0.863 for property name information extraction. With the most conservative evaluation for the property extraction requiring exact match in all the details, we obtained an F1 score of 0.419. We observed that by allowing varying degrees of flexibility in the evaluation, the score can increase to 0.769. We envision that the results and analysis from this study will promote further research directions in developing information extraction strategies from materials information sources.

Duke Scholars

Published In

Integrating Materials and Manufacturing Innovation

DOI

EISSN

2193-9772

ISSN

2193-9764

Publication Date

September 1, 2024

Volume

13

Issue

3

Start / End Page

669 / 687

Related Subject Headings

  • 4016 Materials engineering
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Circi, D., Khalighinejad, G., Chen, A., Dhingra, B., & Brinson, L. C. (2024). How Well Do Large Language Models Understand Tables in Materials Science? Integrating Materials and Manufacturing Innovation, 13(3), 669–687. https://doi.org/10.1007/s40192-024-00362-6
Circi, D., G. Khalighinejad, A. Chen, B. Dhingra, and L. C. Brinson. “How Well Do Large Language Models Understand Tables in Materials Science?Integrating Materials and Manufacturing Innovation 13, no. 3 (September 1, 2024): 669–87. https://doi.org/10.1007/s40192-024-00362-6.
Circi D, Khalighinejad G, Chen A, Dhingra B, Brinson LC. How Well Do Large Language Models Understand Tables in Materials Science? Integrating Materials and Manufacturing Innovation. 2024 Sep 1;13(3):669–87.
Circi, D., et al. “How Well Do Large Language Models Understand Tables in Materials Science?Integrating Materials and Manufacturing Innovation, vol. 13, no. 3, Sept. 2024, pp. 669–87. Scopus, doi:10.1007/s40192-024-00362-6.
Circi D, Khalighinejad G, Chen A, Dhingra B, Brinson LC. How Well Do Large Language Models Understand Tables in Materials Science? Integrating Materials and Manufacturing Innovation. 2024 Sep 1;13(3):669–687.
Journal cover image

Published In

Integrating Materials and Manufacturing Innovation

DOI

EISSN

2193-9772

ISSN

2193-9764

Publication Date

September 1, 2024

Volume

13

Issue

3

Start / End Page

669 / 687

Related Subject Headings

  • 4016 Materials engineering