Skip to main content

VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images

Publication ,  Conference
Maruf, M; Daw, A; Mehrab, KS; Manogaran, HB; Neog, A; Sawhney, M; Khurana, M; Balhoff, JP; Bakış, Y; Altintas, B; Thompson, MJ; Campolongo, EG ...
Published in: Advances in Neural Information Processing Systems
January 1, 2024

Images are increasingly becoming the currency for documenting biodiversity on the planet, providing novel opportunities for accelerating scientific discoveries in the field of organismal biology, especially with the advent of large vision-language models (VLMs). We ask if pre-trained VLMs can aid scientists in answering a range of biologically relevant questions without any additional fine-tuning. In this paper, we evaluate the effectiveness of 12 state-of-the-art (SOTA) VLMs in the field of organismal biology using a novel dataset, VLM4Bio, consisting of 469K question-answer pairs involving 30K images from three groups of organisms: fishes, birds, and butterflies, covering five biologically relevant tasks. We also explore the effects of applying prompting techniques and tests for reasoning hallucination on the performance of VLMs, shedding new light on the capabilities of current SOTA VLMs in answering biologically relevant questions using images.

Duke Scholars

Published In

Advances in Neural Information Processing Systems

ISSN

1049-5258

Publication Date

January 1, 2024

Volume

37

Related Subject Headings

  • 4611 Machine learning
  • 1702 Cognitive Sciences
  • 1701 Psychology
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Maruf, M., Daw, A., Mehrab, K. S., Manogaran, H. B., Neog, A., Sawhney, M., … Karpatne, A. (2024). VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images. In Advances in Neural Information Processing Systems (Vol. 37).
Maruf, M., A. Daw, K. S. Mehrab, H. B. Manogaran, A. Neog, M. Sawhney, M. Khurana, et al. “VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images.” In Advances in Neural Information Processing Systems, Vol. 37, 2024.
Maruf M, Daw A, Mehrab KS, Manogaran HB, Neog A, Sawhney M, et al. VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images. In: Advances in Neural Information Processing Systems. 2024.
Maruf, M., et al. “VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images.” Advances in Neural Information Processing Systems, vol. 37, 2024.
Maruf M, Daw A, Mehrab KS, Manogaran HB, Neog A, Sawhney M, Khurana M, Balhoff JP, Bakış Y, Altintas B, Thompson MJ, Campolongo EG, Uyeda JC, Lapp H, Bart HL, Mabee PM, Su Y, Chao WL, Stewart C, Berger-Wolf T, Dahdul W, Karpatne A. VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images. Advances in Neural Information Processing Systems. 2024.

Published In

Advances in Neural Information Processing Systems

ISSN

1049-5258

Publication Date

January 1, 2024

Volume

37

Related Subject Headings

  • 4611 Machine learning
  • 1702 Cognitive Sciences
  • 1701 Psychology