Skip to main content

Bhuwan Dhingra

Assistant Professor of Computer Science
Computer Science
308 Research Drive, Durham, NC 27708

Selected Publications


How Well Do Large Language Models Understand Tables in Materials Science?

Journal Article Integrating Materials and Manufacturing Innovation · September 1, 2024 Advances in materials science require leveraging past findings and data from the vast published literature. While some materials data repositories are being built, they typically rely on newly created data in narrow domains because extracting detailed data ... Full text Cite

Development and validation of VaxConcerns: A taxonomy of vaccine concerns and misinformation with Crowdsource-Viability.

Journal Article Vaccine · April 2024 We present VaxConcerns, a taxonomy for vaccine concerns and misinformation. VaxConcerns is an easy-to-teach taxonomy of concerns and misinformation commonly found among online anti-vaccination media and is evaluated to produce high-quality data annotations ... Full text Cite

Sequence Reducible Holdout Loss for Language Model Pretraining

Conference 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation, LREC-COLING 2024 - Main Conference Proceedings · January 1, 2024 Data selection techniques, which adaptively select datapoints inside the training loop, have demonstrated empirical benefits in reducing the number of gradient steps to train neural models. However, these techniques have so far largely been applied to clas ... Cite

Tailoring Vaccine Messaging with Common-Ground Opinions

Conference Findings of the Association for Computational Linguistics: NAACL 2024 - Findings · January 1, 2024 One way to personalize chatbot interactions is by establishing common ground with the intended reader. A domain where establishing mutual understanding could be particularly impactful is vaccine concerns and misinformation. Vaccine interventions are forms ... Cite

SumCSE: Summary as a transformation for Contrastive Learning

Conference Findings of the Association for Computational Linguistics: NAACL 2024 - Findings · January 1, 2024 Sentence embedding models are typically trained using contrastive learning (CL), either using human annotations directly or by repurposing other annotated datasets. In this work, we explore the recently introduced paradigm of generating CL data using gener ... Cite

Hierarchical Multi-label Classification of Online Vaccine Concerns

Chapter · January 1, 2024 Vaccine concerns are an ever-evolving target, and can shift quickly as seen during the COVID-19 pandemic. Identifying longitudinal trends in vaccine concerns and misinformation might inform the healthcare space by helping public health efforts strategicall ... Full text Cite

Extracting Polymer Nanocomposite Samples from Full-Length Documents

Conference Proceedings of the Annual Meeting of the Association for Computational Linguistics · January 1, 2024 This paper investigates the use of large language models (LLMs) for extracting sample lists of polymer nanocomposites (PNCs) from full-length materials science research papers. The challenge lies in the complex nature of PNC samples, which have numerous at ... Cite

Raccoon: Prompt Extraction Benchmark of LLM-Integrated Applications

Conference Proceedings of the Annual Meeting of the Association for Computational Linguistics · January 1, 2024 With the proliferation of LLM-integrated applications such as GPT-s, millions are deployed, offering valuable services through proprietary instruction prompts. These systems, however, are prone to prompt extraction attacks through meticulously designed que ... Cite

Navigating the Ethical Landmines of ChatGPT: Implications of Intelligent Chatbots in Plastic Surgery Clinical Practice.

Journal Article Plast Reconstr Surg Glob Open · September 2023 ChatGPT is a cutting-edge language model developed by OpenAI with the potential to impact all facets of plastic surgery from research to clinical practice. New applications for ChatGPT are emerging at a rapid pace in both the scientific literature and popu ... Full text Link to item Cite

Interface Design for Crowdsourcing Hierarchical Multi-Label Text Annotations

Conference Conference on Human Factors in Computing Systems - Proceedings · April 19, 2023 Human data labeling is an important and expensive task at the heart of supervised learning systems. Hierarchies help humans understand and organize concepts. We ask whether and how concept hierarchies can inform the design of annotation interfaces to impro ... Full text Cite

DIFFQG: Generating Questions to Summarize Factual Changes

Conference EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference · January 1, 2023 Identifying the difference between two versions of the same article is useful to update knowledge bases and to understand how articles evolve. Paired texts occur naturally in diverse situations: reporters write similar news stories and maintainers of autho ... Cite

Learning the Legibility of Visual Text Perturbations

Conference EACL 2023 - 17th Conference of the European Chapter of the Association for Computational Linguistics, Proceedings of the Conference · January 1, 2023 Many adversarial attacks in NLP perturb inputs to produce visually similar strings ('ergo' → 'εrgo') which are legible to humans but degrade model performance. Although preserving legibility is a necessary condition for text perturbation, little work has b ... Cite

Hierarchical Multi-Instance Multi-Label Learning for Detecting Propaganda Techniques

Conference Proceedings of the Annual Meeting of the Association for Computational Linguistics · January 1, 2023 Since the introduction of the SemEval 2020 Task 11 (Martino et al., 2020a), several approaches have been proposed in the literature for classifying propaganda based on the rhetorical techniques used to influence readers. These methods, however, classify on ... Cite

Selectively Answering Ambiguous Questions

Conference EMNLP 2023 - 2023 Conference on Empirical Methods in Natural Language Processing, Proceedings · January 1, 2023 Trustworthy language models should abstain from answering questions when they do not know the answer. However, the answer to a question can be unknown for a variety of reasons. Prior research has focused on the case in which the question is clear and the a ... Cite

Learning the Legibility of Visual Text Perturbations

Conference 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023 · 2023 Cite

DIFFQG: Generating Questions to Summarize Factual Changes

Conference 17TH CONFERENCE OF THE EUROPEAN CHAPTER OF THE ASSOCIATION FOR COMPUTATIONAL LINGUISTICS, EACL 2023 · 2023 Cite

Time-Aware Language Models as Temporal Knowledge Bases

Journal Article Transactions of the Association for Computational Linguistics · March 18, 2022 AbstractMany facts come with an expiration date, from the name of the President to the basketball team Lebron James plays for. However, most language models (LMs) are trained on snapshots of data collected a ... Full text Open Access Cite

Characterizing the Efficiency vs. Accuracy Trade-off for Long-Context NLP Models

Conference NLP-Power 2022 - 1st Workshop on Efficient Benchmarking in NLP, Proceedings of the Workshop · January 1, 2022 With many real-world applications of Natural Language Processing (NLP) comprising of long texts, there has been a rise in NLP benchmarks that measure the accuracy of models that can handle longer input sequences. However, these benchmarks do not consider t ... Full text Cite

ASQA: Factoid Questions Meet Long-Form Answers

Conference Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 · January 1, 2022 An abundance of datasets and availability of reliable evaluation metrics have resulted in strong progress in factoid question answering (QA). This progress, however, does not easily transfer to the task of long-form QA, where the goal is to answer question ... Cite

Siamese BERT for authorship verification

Conference CEUR Workshop Proceedings · January 1, 2021 The PAN 2021 authorship verification (AV) challenge focuses on determining if two texts are written by the same author or not, specifically when faced with new, unseen, authors. In our approach, we construct a Siamese network initialized with pretrained BE ... Cite