Skip to main content

Lin Lin

Associate Professor of Biostatistics & Bioinformatics
Biostatistics & Bioinformatics, Division of Integrative Genomics
Box 90251, Durham, NC 27708-0251
214 Old Chemistry, Durham, NC 27708

Selected Publications


GeM-LR: Discovering predictive biomarkers for small datasets in vaccine studies.

Journal Article PLoS Comput Biol · November 2024 Despite significant progress in vaccine research, the level of protection provided by vaccination can vary significantly across individuals. As a result, understanding immunologic variation across individuals in response to vaccination is important for dev ... Full text Link to item Cite

Statistical and machine learning methods for immunoprofiling based on single-cell data.

Journal Article Hum Vaccin Immunother · August 1, 2023 Immunoprofiling has become a crucial tool for understanding the complex interactions between the immune system and diseases or interventions, such as therapies and vaccinations. Immune response biomarkers are critical for understanding those relationships ... Full text Link to item Cite

Multisource single-cell data integration by MAW barycenter for Gaussian mixture models.

Journal Article Biometrics · June 2023 One key challenge encountered in single-cell data clustering is to combine clustering results of data sets acquired from multiple sources. We propose to represent the clustering result of each data set by a Gaussian mixture model (GMM) and produce an integ ... Full text Link to item Cite

Multi-view clustering by CPS-merge analysis with application to multimodal single-cell data.

Journal Article PLoS Comput Biol · April 2023 Multi-view data can be generated from diverse sources, by different technologies, and in multiple modalities. In various fields, integrating information from multi-view data has pushed the frontier of discovery. In this paper, we develop a new approach for ... Full text Link to item Cite

Probabilistic Model Incorporating Auxiliary Covariates to Control FDR

Conference International Conference on Information and Knowledge Management, Proceedings · October 17, 2022 Controlling False Discovery Rate (FDR) while leveraging the side information of multiple hypothesis testing is an emerging research topic in modern data science. Existing methods rely on the test-level covariates while ignoring metrics about test-level cov ... Full text Cite

Block-Wise Variable Selection for Clustering Via Latent States of Mixture Models

Journal Article Journal of Computational and Graphical Statistics · January 2, 2022 Full text Cite

Mixture of Linear Models Co-supervised by Deep Neural Networks

Journal Article Journal of Computational and Graphical Statistics · January 1, 2022 Deep neural networks (DNN) have been demonstrated to achieve unparalleled prediction accuracy in a wide range of applications. Despite its strong performance, in certain areas, the usage of DNN has met resistance because of its black-box nature. In this ar ... Full text Cite

Interpretable Representation Learning from Temporal Multi-view Data

Conference Proceedings of Machine Learning Research · January 1, 2022 In many scientific problems such as video surveillance, modern genomics, and finance, data are often collected from diverse measurements across time that exhibit time-dependent heterogeneous properties. Thus, it is important to not only integrate data from ... Cite

VtNet: A neural network with variable importance assessment

Journal Article Stat · December 2021 The architectures of many neural networks rely heavily on the underlying grid associated with the variables, for instance, the lattice of pixels in an image. For general biomedical data without a grid structure, the multi‐layer perceptron (MLP) and ... Full text Cite

Bayesian mixture models for cytometry data analysis

Journal Article Wiley Interdisciplinary Reviews: Computational Statistics · July 1, 2021 Bayesian mixture models are increasingly used for model-based clustering and the follow-up analysis on the clusters identified. As such, they are of particular interest for analyzing cytometry data where unsupervised clustering and association studies are ... Full text Cite

A Sample Covariance-Based Approach For Spatial Binary Data

Journal Article Journal of Agricultural, Biological and Environmental Statistics · June 2021 Full text Cite

Curbing the COVID-19 pandemic with facility-based isolation of mild cases: a mathematical modeling study.

Journal Article J Travel Med · February 23, 2021 BACKGROUND: In many countries, patients with mild coronavirus disease 2019 (COVID-19) are told to self-isolate at home, but imperfect compliance and shared living space with uninfected people limit the effectiveness of home-based isolation. We examine the ... Full text Link to item Cite

Optimal Transport with Relaxed Marginal Constraints

Journal Article IEEE Access · January 1, 2021 Optimal transport (OT) is a principled approach for matching, having achieved success in diverse applications such as tracking and cluster alignment. It is also the core computation problem for solving the Wasserstein metric between probabilistic distribut ... Full text Cite

CPS analysis: self-contained validation of biomedical data clustering.

Journal Article Bioinformatics · June 1, 2020 MOTIVATION: Cluster analysis is widely used to identify interesting subgroups in biomedical data. Since true class labels are unknown in the unsupervised setting, it is challenging to validate any cluster obtained computationally, an important problem bare ... Full text Link to item Cite

Optimal transport, mean partition, and uncertainty assessment in cluster analysis

Journal Article Statistical Analysis and Data Mining: The ASA Data Science Journal · October 2019 AbstractIn scientific data analysis, clusters identified computationally often substantiate existing hypotheses or motivate new ones. Yet the combinatorial nature of the clustering result, which is a partition rather than a ... Full text Cite

A computational framework to assess genome-wide distribution of polymorphic human endogenous retrovirus-K In human populations.

Journal Article PLoS Comput Biol · March 2019 Human Endogenous Retrovirus type K (HERV-K) is the only HERV known to be insertionally polymorphic; not all individuals have a retrovirus at a specific genomic location. It is possible that HERV-Ks contribute to human disease because people differ in both ... Full text Link to item Cite

Bayesian multidimensional scaling procedure with variable selection

Journal Article Computational Statistics and Data Analysis · January 1, 2019 Multidimensional scaling methods are frequently used by researchers and practitioners to project high dimensional data into a low dimensional space. However, it is a challenge to integrate side information which is available along with the dissimilarities ... Full text Cite

Defending Against Adversarial Samples Without Security through Obscurity

Conference 2018 IEEE International Conference on Data Mining (ICDM) · November 2018 Full text Cite

Explaining Deep Learning Models - A Bayesian Non-parametric Approach

Conference ADVANCES IN NEURAL INFORMATION PROCESSING SYSTEMS 31 (NIPS 2018) · 2018 Cite

Quantitative methods and bayesian models for flow cytometry analysis in HIV/AIDS research

Chapter · January 1, 2017 Flow cytometry is a multiparameter single-cell assay ubiquitous in HIV/AIDS clinical and research settings for evaluating the immune response to the virus, therapy, and vaccination. In clinical practice, flow cytometry is used to monitor the CD4 and CD8 T ... Full text Cite

Baum–Welch algorithm on directed acyclic graph for mixtures with latent Bayesian networks

Journal Article Stat · January 2017 We consider a mixture model with latent Bayesian network (MLBN) for a set of random vectors Full text Cite

Clustering with Hidden Markov Model on Variable Blocks

Journal Article JOURNAL OF MACHINE LEARNING RESEARCH · 2017 Cite

From Physical to Cyber

Conference Proceedings of the 14th ACM Conference on Embedded Network Sensor Systems CD-ROM · November 14, 2016 Full text Cite

Discriminative variable subsets in Bayesian classification with mixture models, with application in flow cytometry studies.

Journal Article Biostatistics · January 2016 We discuss the evaluation of subsets of variables for the discriminative evidence they provide in multivariate mixture modeling for classification. The novel development of Bayesian classification analysis presented is partly motivated by problems of desig ... Full text Link to item Cite

T Cell Responses against Mycobacterial Lipids and Proteins Are Poorly Correlated in South African Adolescents.

Journal Article J Immunol · November 15, 2015 Human T cells are activated by both peptide and nonpeptide Ags produced by Mycobacterium tuberculosis. T cells recognize cell wall lipids bound to CD1 molecules, but effector functions of CD1-reactive T cells have not been systematically assessed in M. tub ... Full text Link to item Cite

Identification and visualization of multidimensional antigen-specific T-cell populations in polychromatic cytometry data.

Journal Article Cytometry A · July 2015 An important aspect of immune monitoring for vaccine development, clinical trials, and research is the detection, measurement, and comparison of antigen-specific T-cells from subject samples under different conditions. Antigen-specific T-cells compose a ve ... Full text Link to item Cite

COMPASS identifies T-cell subsets correlated with clinical outcomes.

Journal Article Nat Biotechnol · June 2015 Advances in flow cytometry and other single-cell technologies have enabled high-dimensional, high-throughput measurements of individual cells as well as the interrogation of cell population heterogeneity. However, in many instances, computational tools to ... Full text Link to item Cite

Hierarchical Bayesian mixture modelling for antigen-specific T-cell subtyping in combinatorially encoded flow cytometry studies.

Journal Article Stat Appl Genet Mol Biol · June 2013 Novel uses of automated flow cytometry technology for measuring levels of protein markers on thousands to millions of cells are promoting increasing need for relevant, customized Bayesian mixture modelling approaches in many areas of biomedical research an ... Full text Link to item Cite

Hierarchical modeling for rare event detection and cell subset alignment across flow cytometry samples.

Journal Article PLoS Comput Biol · 2013 Flow cytometry is the prototypical assay for multi-parameter single cell analysis, and is essential in vaccine and biomarker research for the enumeration of antigen-specific lymphocytes that are often found in extremely low frequencies (0.1% or less). Stan ... Full text Link to item Cite

Optimization of a highly standardized carboxyfluorescein succinimidyl ester flow cytometry panel and gating strategy design using discriminative information measure evaluation.

Journal Article Cytometry A · December 2010 The design of a panel to identify target cell subsets in flow cytometry can be difficult when specific markers unique to each cell subset do not exist, and a combination of parameters must be used to identify target cells of interest and exclude irrelevant ... Full text Link to item Cite