Skip to main content

Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques.

Publication ,  Journal Article
Dixit, S; Mao, W; McDade, KK; Schäferhoff, M; Ogbuoji, O; Yamey, G
Published in: Frontiers in public health
January 2022

Tracking global health funding is a crucial but time consuming and labor-intensive process. This study aimed to develop a framework to automate the tracking of global health spending using natural language processing (NLP) and machine learning (ML) algorithms. We used the global common goods for health (CGH) categories developed by Schäferhoff et al. to design and evaluate ML models.We used data curated by Schäferhoff et al., which tracked the official development assistance (ODA) disbursements to global CGH for 2013, 2015, and 2017, for training and validating the ML models. To process raw text, we implemented different NLP techniques, such as removing stop words, lemmatization, and creation of synthetic text, to balance the dataset. We used four supervised learning ML algorithms-random forest (RF), XGBOOST, support vector machine (SVM), and multinomial naïve Bayes (MNB) (see Glossary)-to train and test the pre-coded dataset, and applied the best model on dataset that hasn't been manually coded to predict the financing for CGH in 2019.After we trained the machine on the training dataset (n = 10,534), the weighted average F1-scores (a measure of a ML model's performance) on the testing dataset (n = 2,634) ranked 0.79-0.83 among four models, and the RF model had the best performance (F1-score = 0.83). The predicted total donor support for CGH projects by the RF model was $2.24 billion across 3 years, which was very close to the finding of $2.25 billion derived from coding and classification by humans. By applying the trained RF model on the 2019 dataset, we predicted that the total funding for global CGH was about $2.7 billion for 730 CGH projects.We have demonstrated that NLP and ML can be a feasible and efficient way to classify health projects into different global CGH categories, and thus track health funding for CGH routinely using data from publicly available databases.

Duke Scholars

Altmetric Attention Stats
Dimensions Citation Stats

Published In

Frontiers in public health

DOI

EISSN

2296-2565

ISSN

2296-2565

Publication Date

January 2022

Volume

10

Start / End Page

1031147

Related Subject Headings

  • Social Justice
  • Natural Language Processing
  • Machine Learning
  • Humans
  • Global Health
  • Bayes Theorem
  • 4206 Public health
  • 4203 Health services and systems
  • 1117 Public Health and Health Services
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Dixit, S., Mao, W., McDade, K. K., Schäferhoff, M., Ogbuoji, O., & Yamey, G. (2022). Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques. Frontiers in Public Health, 10, 1031147. https://doi.org/10.3389/fpubh.2022.1031147
Dixit, Siddharth, Wenhui Mao, Kaci Kennedy McDade, Marco Schäferhoff, Osondu Ogbuoji, and Gavin Yamey. “Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques.Frontiers in Public Health 10 (January 2022): 1031147. https://doi.org/10.3389/fpubh.2022.1031147.
Dixit S, Mao W, McDade KK, Schäferhoff M, Ogbuoji O, Yamey G. Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques. Frontiers in public health. 2022 Jan;10:1031147.
Dixit, Siddharth, et al. “Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques.Frontiers in Public Health, vol. 10, Jan. 2022, p. 1031147. Epmc, doi:10.3389/fpubh.2022.1031147.
Dixit S, Mao W, McDade KK, Schäferhoff M, Ogbuoji O, Yamey G. Tracking financing for global common goods for health: A machine learning approach using natural language processing techniques. Frontiers in public health. 2022 Jan;10:1031147.

Published In

Frontiers in public health

DOI

EISSN

2296-2565

ISSN

2296-2565

Publication Date

January 2022

Volume

10

Start / End Page

1031147

Related Subject Headings

  • Social Justice
  • Natural Language Processing
  • Machine Learning
  • Humans
  • Global Health
  • Bayes Theorem
  • 4206 Public health
  • 4203 Health services and systems
  • 1117 Public Health and Health Services