Skip to main content

Simon Mak

Assistant Professor of Statistical Science
Statistical Science
214 Old Chemistry, Box 90251, Durham, NC 27708-0251
214 Old Chemistry, Box 90251, Durham, NC 27708-0251

Selected Publications


Hard jet substructure in a multistage approach

Journal Article Physical Review C · October 1, 2024 We present predictions and postdictions for a wide variety of hard jet-substructure observables using a multistage model within the jetscape framework. The details of the multistage model and the various parameter choices are described in [Phys. Rev. C 107 ... Full text Cite

SentHYMNent: An Interpretable and Sentiment-Driven Model for Algorithmic Melody Harmonization

Conference Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining · August 25, 2024 Music composition and analysis is an inherently creative task, involving a combination of heart and mind. However, the vast majority of algorithmic music models completely ignore the "heart"component of music, resulting in output that often lacks the rich ... Full text Cite

A misfire-integrated Gaussian process (MInt-GP) emulator for energy-assisted compression ignition (EACI) engines with varying cetane number jet fuels

Journal Article International Journal of Engine Research · July 1, 2024 For energy-assisted compression ignition (EACI) engine propulsion at high-altitude operating conditions using sustainable jet fuels with varying cetane numbers, it is essential to develop an efficient engine control system for robust and optimal operation. ... Full text Cite

New metric improving Bayesian calibration of a multistage approach studying hadron and inclusive jet suppression

Journal Article Physical Review C · June 1, 2024 We study parton energy-momentum exchange with the quark gluon plasma (QGP) within a multistage approach composed of in-medium Dokshitzer-Gribov-Lipatov-Altarelli-Parisi evolution at high virtuality, and (linearized) Boltzmann transport formalism at lower v ... Full text Cite

eRPCA: Robust Principal Component Analysis for Exponential Family Distributions

Journal Article Statistical Analysis and Data Mining · April 1, 2024 Robust principal component analysis (RPCA) is a widely used method for recovering low-rank structure from data matrices corrupted by significant and sparse outliers. These corruptions may arise from occlusions, malicious tampering, or other causes for anom ... Full text Cite

Stacking Designs: Designing Multifidelity Computer Experiments with Target Predictive Accuracy

Journal Article SIAM-ASA Journal on Uncertainty Quantification · March 1, 2024 In an era where scientific experiments can be very costly, multifidelity emulators provide a useful tool for cost-efficient predictive scientific computing. For scientific applications, the experimenter is often limited by a tight computational budget, and ... Full text Cite

A Hierarchical Expected Improvement Method for Bayesian Optimization

Journal Article Journal of the American Statistical Association · January 1, 2024 The Expected Improvement (EI) method, proposed by Jones, Schonlau, andWelch, is a widely used Bayesian optimization method, which makes use of a fitted Gaussian process model for efficient black-box optimization. However, one key drawback of EI is that it ... Full text Cite

A Graphical Multi-Fidelity Gaussian Process Model, with Application to Emulation of Heavy-Ion Collisions

Journal Article Technometrics · January 1, 2024 With advances in scientific computing and mathematical modeling, complex scientific phenomena such as galaxy formations and rocket propulsion can now be reliably simulated. Such simulations can however be very time-intensive, requiring millions of CPU hour ... Full text Cite

Conglomerate Multi-fidelity Gaussian Process Modeling, with Application to Heavy-Ion Collisions

Journal Article SIAM-ASA Journal on Uncertainty Quantification · January 1, 2024 In an era where scientific experimentation is often costly, multi-fidelity emulation provides a powerful tool for predictive scientific computing. While there has been notable work on multi-fidelity modeling, existing models do not incorporate an important ... Full text Cite

Trigonometric Quadrature Fourier Features for Scalable Gaussian Process Regression

Conference Proceedings of Machine Learning Research · January 1, 2024 Fourier feature approximations have been successfully applied in the literature for scalable Gaussian Process (GP) regression. In particular, Quadrature Fourier Features (QFF) derived from Gaussian quadrature rules have gained popularity in recent years du ... Cite

Energy balancing of covariate distributions

Journal Article Journal of Causal Inference · January 1, 2024 Bias in causal comparisons has a correspondence with distributional imbalance of covariates between treatment groups. Weighting strategies such as inverse propensity score weighting attempt to mitigate bias by either modeling the treatment assignment mecha ... Full text Cite

Hierarchical Shrinkage Gaussian Processes: Applications to Computer Code Emulation and Dynamical System Recovery

Journal Article SIAM-ASA Journal on Uncertainty Quantification · January 1, 2024 In many areas of science and engineering, computer simulations are widely used as proxies for physical experiments, which can be infeasible or unethical. Such simulations are often computationally expensive, and an emulator can be trained to efficiently pr ... Full text Cite

MaLT: Machine-Learning-Guided Test Case Design and Fault Localization of Complex Software Systems

Conference Proceedings - 2024 22nd ACM-IEEE International Symposium on Formal Methods and Models for System Design, MEMOCODE 2024 · January 1, 2024 Software testing is essential for the reliable and robust development of complex software systems. This is particularly critical for cyber-physical systems (CPS), which require rigorous testing prior to deployment. The complexity of these systems limits th ... Full text Cite

An Interpretable, Flexible, and Interactive Probabilistic Framework for Melody Generation

Conference Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining · August 6, 2023 The fast-growing demand for algorithmic music generation is found throughout entertainment, art, education, etc. Unfortunately, most recent models are practically impossible to interpret or musically fine-tune, as they use deep neural networks with thousan ... Full text Cite

Multiscale evolution of charmed particles in a nuclear medium

Journal Article Physical Review C · May 1, 2023 Parton energy-momentum exchange with the quark gluon plasma (QGP) is a multiscale problem. In this work, we calculate the interaction of charm quarks with the QGP within the higher twist formalism at high virtuality and high energy using the Modular All Tw ... Full text Cite

Inclusive jet and hadron suppression in a multistage approach

Journal Article Physical Review C · March 1, 2023 We present a new study of jet interactions in the quark-gluon plasma created in high-energy heavy-ion collisions, using a multistage event generator within the jetscape framework. We focus on medium-induced modifications in the rate of inclusive jets and h ... Full text Cite

Sequential Change-Point Detection for Mutually Exciting Point Processes

Journal Article Technometrics · January 1, 2023 We present a new CUSUM procedure for sequential change-point detection in self- and mutually-exciting point processes (specifically, Hawkes networks) using discrete events data. Hawkes networks have become a popular model in statistics and machine learning ... Full text Cite

PERCEPT: A New Online Change-Point Detection Method using Topological Data Analysis

Journal Article Technometrics · January 1, 2023 Topological data analysis (TDA) provides a set of data analysis tools for extracting embedded topological structures from complex high-dimensional datasets. In recent years, TDA has been a rapidly growing field which has found success in a wide range of ap ... Full text Cite

Bayesian Uncertainty Quantification for Low-Rank Matrix Completion

Journal Article Bayesian Analysis · January 1, 2023 We consider the problem of uncertainty quantification for an unknown low-rank matrix X, given a partial and noisy observation of its entries. This quantification of uncertainty is essential for many real-world problems, including image processing, satellit ... Full text Cite

Correction: Physics-integrated Segmented Gaussian Process (SegGP) learning for cost-efficient training of diesel engine control system with low cetane numbers (American Institute of Aeronautics and Astronautics Inc, AIAA)

Journal Article AIAA SciTech Forum and Exposition, 2023 · January 1, 2023 Correction Notice Please write out the details of your corrections here. Place any figure, image, or math updates as well. Be as specific as possible and refer to the original paper details. Please see an example of a correction here: https://arc.aiaa.org/ ... Full text Cite

Physics-integrated Segmented Gaussian Process (SegGP) learning for cost-efficient training of diesel engine control system with low cetane numbers

Conference AIAA SciTech Forum and Exposition, 2023 · January 1, 2023 Control model training is an essential step towards the development of an engine controls system. A robust controls strategy is required for engines to perform reliably and optimally under challenging conditions, such as using low cetane number fuels (vita ... Full text Cite

BayesFLo: Bayesian Fault Localization for Software Testing

Conference Proceedings - 2023 IEEE 23rd International Conference on Software Quality, Reliability, and Security Companion, QRS-C 2023 · January 1, 2023 Fault localization is a software testing activity that is critical when software failures occur. We propose a novel Bayesian fault localization method, yielding a principled and probabilistic ranking of suspicious input combinations for identifying the roo ... Full text Cite

Role of bulk viscosity in deuteron production in ultrarelativistic nuclear collisions

Journal Article Physical Review C · December 1, 2022 We use a Bayesian-calibrated multistage viscous hydrodynamic model to explore deuteron yield, mean transverse momentum and flow observables in Pb-Pb collisions at the Large Hadron Collider. We explore theoretical uncertainty in the production of deuterons, ... Full text Cite

ADAPTIVE DESIGN FOR GAUSSIAN PROCESS REGRESSION UNDER CENSORING

Journal Article Annals of Applied Statistics · June 1, 2022 A key objective in engineering problems is to predict an unknown experimental surface over an input domain. In complex physical experiments this may be hampered by response censoring which results in a significant loss of information. For such problems, ex ... Full text Cite

Efficient emulation of relativistic heavy ion collisions with transfer learning

Journal Article Physical Review C · March 1, 2022 Measurements from the Large Hadron Collider (LHC) and the Relativistic Heavy Ion Collider (RHIC) can be used to study the properties of quark-gluon plasma. Systematic constraints on these properties must combine measurements from different collision system ... Full text Cite

GAUSSIAN PROCESS SUBSPACE PREDICTION FOR MODEL REDUCTION

Journal Article SIAM Journal on Scientific Computing · January 1, 2022 Subspace-valued functions arise in a wide range of problems, including parametric reduced order modeling (PROM), parameter reduction, and subspace tracking. In PROM, each parameter point can be associated with a subspace, which is used for Petrov–Galerkin ... Full text Cite

TSEC: A Framework for Online Experimentation under Experimental Constraints

Journal Article Technometrics · January 1, 2022 Thompson sampling is a popular algorithm for tackling multi-armed bandit problems, and has been applied in a wide range of applications, from website design to portfolio optimization. In such applications, however, the number of choices (or arms) N can be ... Full text Cite

Population Quasi-Monte Carlo

Journal Article Journal of Computational and Graphical Statistics · January 1, 2022 Monte Carlo methods are widely used for approximating complicated, multidimensional integrals for Bayesian inference. Population Monte Carlo (PMC) is an important class of Monte Carlo methods, which adapts a population of proposals to generate weighted sam ... Full text Cite

Determining the jet transport coefficient q from inclusive hadron suppression measurements using Bayesian parameter estimation

Journal Article Physical Review C · August 1, 2021 We report a new determination of q, the jet transport coefficient of the quark-gluon plasma. We use the JETSCAPE framework, which incorporates a novel multistage theoretical approach to in-medium jet evolution and Bayesian inference for parameter extractio ... Full text Cite

Supervised compression of big data

Journal Article Statistical Analysis and Data Mining · June 1, 2021 The phenomenon of big data has become ubiquitous in nearly all disciplines, from science to engineering. A key challenge is the use of such data for fitting statistical and machine learning models, which can incur high computational and storage costs. One ... Full text Cite

Function-on-Function Kriging, With Applications to Three-Dimensional Printing of Aortic Tissues

Journal Article Technometrics · January 1, 2021 Three-dimensional printed medical prototypes, which use synthetic metamaterials to mimic biological tissue, are becoming increasingly important in urgent surgical applications. However, the mimicking of tissue mechanical properties via three-dimensional pr ... Full text Cite

Multi-system Bayesian constraints on the transport coefficients of QCD matter

Journal Article · November 6, 2020 We study the properties of the strongly-coupled quark-gluon plasma with a multistage model of heavy ion collisions that combines the T$_\mathrm{R}$ENTo initial condition ansatz, free-streaming, viscous relativistic hydrodynamics, and a relativistic hadroni ... Link to item Cite

Phenomenological constraints on the transport properties of QCD matter with data-driven model averaging

Journal Article · October 8, 2020 Using combined data from the Relativistic Heavy Ion and Large Hadron Colliders, we constrain the shear and bulk viscosities of quark-gluon plasma (QGP) at temperatures of ${\sim\,}150{-}350$ MeV. We use Bayesian inference to translate experimental and theo ... Link to item Cite

Adaptive approximation for multivariate linear problems with inputs lying in a cone

Chapter · June 8, 2020 We study adaptive approximation algorithms for general multivariate linear problems, where the sets of input functions are nonconvex cones. Whereas it is known that adaptive algorithms perform essentially no better than nonadaptive algorithms for convex an ... Full text Cite

Uncertainty quantification for inferring Hawkes networks

Conference Advances in Neural Information Processing Systems · January 1, 2020 Multivariate Hawkes processes are commonly used to model streaming networked event data in a wide variety of applications. However, it remains a challenge to extract reliable inference from complex datasets with uncertainty quantification. Aiming towards t ... Cite

Analysis-of-Marginal-Tail-Means (ATM): A Robust Method for Discrete Black-Box Optimization

Journal Article Technometrics · October 2, 2019 We present a new method, called analysis-of-marginal-tail-means (ATM), for effective robust optimization of discrete black-box problems. ATM has important applications in many real-world engineering problems (e.g., manufacturing optimization, product desig ... Full text Cite

cmenet: A New Method for Bi-Level Variable Selection of Conditional Main Effects

Journal Article Journal of the American Statistical Association · April 3, 2019 This article introduces a novel method for selecting main effects and a set of reparameterized effects called conditional main effects (CMEs), which capture the conditional effect of a factor at a fixed level of another factor. CMEs represent interpretable ... Full text Cite

Kernel-smoothed proper orthogonal decomposition-based emulation for spatiotemporally evolving flow dynamics prediction

Journal Article AIAA Journal · January 1, 2019 This interdisciplinary study, which combines machine learning, statistical methodologies, high-fidelity simulations, projection-based model reduction, and flow physics, demonstrates a new process for building an efficient surrogate model to predict spatiot ... Full text Cite

An Efficient Surrogate Model for Emulation and Physics Extraction of Large Eddy Simulations

Journal Article Journal of the American Statistical Association · October 2, 2018 In the quest for advanced propulsion and power-generation systems, high-fidelity simulations are too computationally expensive to survey the desired design space, and a new design methodology is needed that combines engineering physics, computer simulation ... Full text Cite

Maximum entropy low-rank matrix recovery

Journal Article IEEE Journal on Selected Topics in Signal Processing · October 1, 2018 We propose a novel, information-theoretic method, called MaxEnt, for efficient data acquisition for low-rank matrix recovery. This proposed method has important applications to a wide range of problems, including image processing and text document indexing ... Full text Cite

Maximum Entropy Low-Rank Matrix Recovery

Conference IEEE International Symposium on Information Theory - Proceedings · August 15, 2018 We propose a novel, information-theoretic mask construction method, called MaxEnt, for efficient data acquisition for low-rank matrix recovery. Fundamental to this design approach is the maximum entropy principle, which states that the measurement masks wh ... Full text Cite

Minimax and Minimax Projection Designs Using Clustering

Journal Article Journal of Computational and Graphical Statistics · January 2, 2018 Minimax designs provide a uniform coverage of a design space X ⊆ Rp by minimizing the maximum distance from any point in this space to its nearest design point. Although minimax designs have many useful applications, for example, for optimal sensor allocat ... Full text Cite

Uncertainty quantification of flame transfer function under a bayesian framework

Conference AIAA Aerospace Sciences Meeting, 2018 · January 1, 2018 Combustion instability identification techniques have received increasing attentions in the design and development of modern propulsion systems, and one of the most popular approaches is the flame transfer function (FTF). This work proposes a novel method ... Full text Cite

Support points

Journal Article Annals of Statistics · January 1, 2018 This paper introduces a new way to compact a continuous probability distribution F into a set of representative points called support points. These points are obtained by minimizing the energy distance, a statistical potential measure initially proposed by ... Full text Cite

Common proper orthogonal decomposition-based spatiotemporal emulator for design exploration

Journal Article AIAA Journal · January 1, 2018 The present study develops a data-driven framework trained with high-fidelity simulation results to facilitate decision making for combustor designs. Its core is a surrogate model employing a machine-learning technique called kriging, which is uniquely com ... Full text Cite

A two-stage transfer function identification methodology and its applications to bi-swirl injectors

Conference 53rd AIAA/SAE/ASEE Joint Propulsion Conference, 2017 · January 1, 2017 Thermo-acoustic instability identification techniques have received increasing attentions in modern propulsion systems, and one of the most popular approaches is the flame transfer function. Despite the prominent role it plays in instability analysis, the ... Full text Cite

A regional compound Poisson process for hurricane and tropical storm damage

Journal Article Journal of the Royal Statistical Society. Series C: Applied Statistics · November 1, 2016 In light of intense hurricane activity along the US Atlantic coast, attention has turned to understanding both the economic effect and the behaviour of these storms. The compound Poisson–log-normal process has been proposed as a model for aggregate storm d ... Full text Cite