Skip to main content
Journal cover image

Automatic identification of variables in epidemiological datasets using logic regression.

Publication ,  Journal Article
Lorenz, MW; Abdi, NA; Scheckenbach, F; Pflug, A; Bülbül, A; Catapano, AL; Agewall, S; Ezhov, M; Bots, ML; Kiechl, S; Orth, A; PROG-IMT study group
Published in: BMC Med Inform Decis Mak
April 13, 2017

BACKGROUND: For an individual participant data (IPD) meta-analysis, multiple datasets must be transformed in a consistent format, e.g. using uniform variable names. When large numbers of datasets have to be processed, this can be a time-consuming and error-prone task. Automated or semi-automated identification of variables can help to reduce the workload and improve the data quality. For semi-automation high sensitivity in the recognition of matching variables is particularly important, because it allows creating software which for a target variable presents a choice of source variables, from which a user can choose the matching one, with only low risk of having missed a correct source variable. METHODS: For each variable in a set of target variables, a number of simple rules were manually created. With logic regression, an optimal Boolean combination of these rules was searched for every target variable, using a random subset of a large database of epidemiological and clinical cohort data (construction subset). In a second subset of this database (validation subset), this optimal combination rules were validated. RESULTS: In the construction sample, 41 target variables were allocated on average with a positive predictive value (PPV) of 34%, and a negative predictive value (NPV) of 95%. In the validation sample, PPV was 33%, whereas NPV remained at 94%. In the construction sample, PPV was 50% or less in 63% of all variables, in the validation sample in 71% of all variables. CONCLUSIONS: We demonstrated that the application of logic regression in a complex data management task in large epidemiological IPD meta-analyses is feasible. However, the performance of the algorithm is poor, which may require backup strategies.

Duke Scholars

Published In

BMC Med Inform Decis Mak

DOI

EISSN

1472-6947

Publication Date

April 13, 2017

Volume

17

Issue

1

Start / End Page

40

Location

England

Related Subject Headings

  • Prognosis
  • Predictive Value of Tests
  • Meta-Analysis as Topic
  • Medical Informatics Applications
  • Medical Informatics
  • Logistic Models
  • Humans
  • Epidemiologic Factors
  • Databases, Factual
  • Data Mining
 

Citation

APA
Chicago
ICMJE
MLA
NLM
Lorenz, M. W., Abdi, N. A., Scheckenbach, F., Pflug, A., Bülbül, A., Catapano, A. L., … PROG-IMT study group. (2017). Automatic identification of variables in epidemiological datasets using logic regression. BMC Med Inform Decis Mak, 17(1), 40. https://doi.org/10.1186/s12911-017-0429-1
Lorenz, Matthias W., Negin Ashtiani Abdi, Frank Scheckenbach, Anja Pflug, Alpaslan Bülbül, Alberico L. Catapano, Stefan Agewall, et al. “Automatic identification of variables in epidemiological datasets using logic regression.BMC Med Inform Decis Mak 17, no. 1 (April 13, 2017): 40. https://doi.org/10.1186/s12911-017-0429-1.
Lorenz MW, Abdi NA, Scheckenbach F, Pflug A, Bülbül A, Catapano AL, et al. Automatic identification of variables in epidemiological datasets using logic regression. BMC Med Inform Decis Mak. 2017 Apr 13;17(1):40.
Lorenz, Matthias W., et al. “Automatic identification of variables in epidemiological datasets using logic regression.BMC Med Inform Decis Mak, vol. 17, no. 1, Apr. 2017, p. 40. Pubmed, doi:10.1186/s12911-017-0429-1.
Lorenz MW, Abdi NA, Scheckenbach F, Pflug A, Bülbül A, Catapano AL, Agewall S, Ezhov M, Bots ML, Kiechl S, Orth A, PROG-IMT study group. Automatic identification of variables in epidemiological datasets using logic regression. BMC Med Inform Decis Mak. 2017 Apr 13;17(1):40.
Journal cover image

Published In

BMC Med Inform Decis Mak

DOI

EISSN

1472-6947

Publication Date

April 13, 2017

Volume

17

Issue

1

Start / End Page

40

Location

England

Related Subject Headings

  • Prognosis
  • Predictive Value of Tests
  • Meta-Analysis as Topic
  • Medical Informatics Applications
  • Medical Informatics
  • Logistic Models
  • Humans
  • Epidemiologic Factors
  • Databases, Factual
  • Data Mining