Skip to main content
construction release_alert
Scholars@Duke will be undergoing maintenance April 11-15. Some features may be unavailable during this time.
cancel

Pan Xu

Assistant Professor of Biostatistics & Bioinformatics
Biostatistics & Bioinformatics, Division of Integrative Genomics

Selected Publications


Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

Conference SIGMETRICS 2023 - Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems · June 19, 2023 We study a multi-Agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-Term rewards. To overcome the curse of dimensiona ... Full text Cite

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

Journal Article Performance Evaluation Review · June 19, 2023 We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensiona ... Full text Cite

Queer In AI: A Case Study in Community-Led Participatory AI

Conference ACM International Conference Proceeding Series · June 12, 2023 Queerness and queer people face an uncertain future in the face of ever more widely deployed and invasive artificial intelligence (AI). These technologies have caused numerous harms to queer people, including privacy violations, censoring and downranking q ... Full text Cite

Multiple models for outbreak decision support in the face of uncertainty.

Journal Article Proc Natl Acad Sci U S A · May 2, 2023 Policymakers must make management decisions despite incomplete knowledge and conflicting model projections. Little guidance exists for the rapid, representative, and unbiased collection of policy-relevant scientific input from independent modeling teams. I ... Full text Link to item Cite

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

Journal Article Proceedings of the ACM on Measurement and Analysis of Computing Systems · February 28, 2023 We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensiona ... Full text Cite

Distributionally Robust Policy Gradient for Offline Contextual Bandits

Conference Proceedings of Machine Learning Research · January 1, 2023 Learning an optimal policy from offline data is notoriously challenging, which requires the evaluation of the learning policy using data pre-collected from a static logging policy. We study the policy optimization problem in offline contextual bandits usin ... Cite

Thompson Sampling with Less Exploration is Fast and Optimal

Conference Proceedings of Machine Learning Research · January 1, 2023 We propose ϵ-Exploring Thompson Sampling (ϵTS), a modified version of the Thompson Sampling (TS) algorithm (Agrawal & Goyal, 2017) for multi-armed bandits. In ϵ-TS, arms are selected greedily based on empirical mean rewards with probability 1 − ϵ, and base ... Cite

The United States COVID-19 Forecast Hub dataset.

Journal Article Sci Data · August 1, 2022 Academic researchers, government agencies, industry groups, and individuals have produced forecasts at an unprecedented scale during the COVID-19 pandemic. To leverage these forecasts, the United States Centers for Disease Control and Prevention (CDC) part ... Full text Open Access Link to item Cite

Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States.

Journal Article Proc Natl Acad Sci U S A · April 12, 2022 Short-term probabilistic forecasts of the trajectory of the COVID-19 pandemic in the United States have served as a visible and important communication channel between the scientific modeling community and both the general public and decision-makers. Forec ... Full text Open Access Link to item Cite

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

Conference Advances in Neural Information Processing Systems · January 1, 2022 We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family, which covers many common reward distributions including Bernoulli, Gaussian, Gamma, Exponen ... Cite

Active Ranking without Strong Stochastic Transitivity

Conference Advances in Neural Information Processing Systems · January 1, 2022 Ranking from noisy comparisons is of great practical interest in machine learning. In this paper, we consider the problem of recovering the exact full ranking for a list of items under ranking models that do not assume the Strong Stochastic Transitivity pr ... Cite

Langevin Monte Carlo for Contextual Bandits

Conference Proceedings of Machine Learning Research · January 1, 2022 We study the efficiency of Thompson sampling for contextual bandits. Existing Thompson sampling-based algorithms need to construct a Laplace approximation (i.e., a Gaussian distribution) of the posterior distribution, which is inefficient to sample in high ... Cite

Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons

Conference Proceedings of Machine Learning Research · January 1, 2022 In heterogeneous rank aggregation problems, users often exhibit various accuracy levels when comparing pairs of items. Thus, a uniform querying strategy over users may not be optimal. To address this issue, we propose an elimination-based active sampling s ... Cite

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits

Conference Proceedings of Machine Learning Research · January 1, 2021 In batched multi-armed bandit problems, the learner can adaptively pull arms and adjust strategy in batches. In many real applications, not only the regret but also the batch complexity need to be optimized. Existing batched bandit algorithms usually assum ... Cite

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

Conference SIGMETRICS 2023 - Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems · June 19, 2023 We study a multi-Agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-Term rewards. To overcome the curse of dimensiona ... Full text Cite

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

Journal Article Performance Evaluation Review · June 19, 2023 We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensiona ... Full text Cite

Queer In AI: A Case Study in Community-Led Participatory AI

Conference ACM International Conference Proceeding Series · June 12, 2023 Queerness and queer people face an uncertain future in the face of ever more widely deployed and invasive artificial intelligence (AI). These technologies have caused numerous harms to queer people, including privacy violations, censoring and downranking q ... Full text Cite

Multiple models for outbreak decision support in the face of uncertainty.

Journal Article Proc Natl Acad Sci U S A · May 2, 2023 Policymakers must make management decisions despite incomplete knowledge and conflicting model projections. Little guidance exists for the rapid, representative, and unbiased collection of policy-relevant scientific input from independent modeling teams. I ... Full text Link to item Cite

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

Journal Article Proceedings of the ACM on Measurement and Analysis of Computing Systems · February 28, 2023 We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensiona ... Full text Cite

Distributionally Robust Policy Gradient for Offline Contextual Bandits

Conference Proceedings of Machine Learning Research · January 1, 2023 Learning an optimal policy from offline data is notoriously challenging, which requires the evaluation of the learning policy using data pre-collected from a static logging policy. We study the policy optimization problem in offline contextual bandits usin ... Cite

Thompson Sampling with Less Exploration is Fast and Optimal

Conference Proceedings of Machine Learning Research · January 1, 2023 We propose ϵ-Exploring Thompson Sampling (ϵTS), a modified version of the Thompson Sampling (TS) algorithm (Agrawal & Goyal, 2017) for multi-armed bandits. In ϵ-TS, arms are selected greedily based on empirical mean rewards with probability 1 − ϵ, and base ... Cite

The United States COVID-19 Forecast Hub dataset.

Journal Article Sci Data · August 1, 2022 Academic researchers, government agencies, industry groups, and individuals have produced forecasts at an unprecedented scale during the COVID-19 pandemic. To leverage these forecasts, the United States Centers for Disease Control and Prevention (CDC) part ... Full text Open Access Link to item Cite

Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States.

Journal Article Proc Natl Acad Sci U S A · April 12, 2022 Short-term probabilistic forecasts of the trajectory of the COVID-19 pandemic in the United States have served as a visible and important communication channel between the scientific modeling community and both the general public and decision-makers. Forec ... Full text Open Access Link to item Cite

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

Conference Advances in Neural Information Processing Systems · January 1, 2022 We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family, which covers many common reward distributions including Bernoulli, Gaussian, Gamma, Exponen ... Cite

Active Ranking without Strong Stochastic Transitivity

Conference Advances in Neural Information Processing Systems · January 1, 2022 Ranking from noisy comparisons is of great practical interest in machine learning. In this paper, we consider the problem of recovering the exact full ranking for a list of items under ranking models that do not assume the Strong Stochastic Transitivity pr ... Cite

Langevin Monte Carlo for Contextual Bandits

Conference Proceedings of Machine Learning Research · January 1, 2022 We study the efficiency of Thompson sampling for contextual bandits. Existing Thompson sampling-based algorithms need to construct a Laplace approximation (i.e., a Gaussian distribution) of the posterior distribution, which is inefficient to sample in high ... Cite

Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons

Conference Proceedings of Machine Learning Research · January 1, 2022 In heterogeneous rank aggregation problems, users often exhibit various accuracy levels when comparing pairs of items. Thus, a uniform querying strategy over users may not be optimal. To address this issue, we propose an elimination-based active sampling s ... Cite

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits

Conference Proceedings of Machine Learning Research · January 1, 2021 In batched multi-armed bandit problems, the learner can adaptively pull arms and adjust strategy in batches. In many real applications, not only the regret but also the batch complexity need to be optimized. Existing batched bandit algorithms usually assum ... Cite

MOTS: Minimax Optimal Thompson Sampling

Conference Proceedings of Machine Learning Research · January 1, 2021 Thompson sampling is one of the most widely used algorithms for many online decision problems, due to its simplicity in implementation and superior empirical performance over other state-of-the-art methods. Despite its popularity and empirical success, it ... Cite

Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling

Conference Proceedings of Machine Learning Research · January 1, 2021 We provide a new convergence analysis of stochastic gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-revers ... Cite

Stochastic nested variance reduction for nonconvex optimization

Journal Article Journal of Machine Learning Research · May 1, 2020 We study nonconvex optimization problems, where the objective function is either an average of n nonconvex functions or the expectation of some stochastic function. We propose a new stochastic gradient descent algorithm based on nested variance reduction, ... Cite

Rank aggregation via heterogeneous thurstone preference models

Conference AAAI 2020 - 34th AAAI Conference on Artificial Intelligence · January 1, 2020 We propose the Heterogeneous Thurstone Model (HTM) for aggregating ranked data, which can take the accuracy levels of different users into account. By allowing different noise distributions, the proposed HTM model maintains the generality of Thurstone's or ... Cite

A finite-time analysis of Q-Learning with neural network function approximation

Conference 37th International Conference on Machine Learning, ICML 2020 · January 1, 2020 Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtuall ... Cite

A finite-time analysis of two time-scale actor-critic methods

Conference Advances in Neural Information Processing Systems · January 1, 2020 Actor-critic (AC) methods have exhibited great empirical success compared with other reinforcement learning algorithms, where the actor uses the policy gradient to improve the learning policy and the critic uses temporal difference learning to estimate the ... Cite

Sampling from non-log-concave distributions via stochastic variance-reduced gradient Langevin dynamics

Conference AISTATS 2019 - 22nd International Conference on Artificial Intelligence and Statistics · January 1, 2020 We study stochastic variance reduction-based Langevin dynamic algorithms, SVRG-LD and SAGA-LD (Dubey et al., 2016), for sampling from non-log-concave distributions. Under certain assumptions on the log density function, we establish the convergence guarant ... Cite

SAMPLE EFFICIENT POLICY GRADIENT METHODS WITH RECURSIVE VARIANCE REDUCTION

Conference 8th International Conference on Learning Representations, ICLR 2020 · January 1, 2020 Improving the sample efficiency in reinforcement learning has been a longstanding research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which ... Cite

Stochastic variance-reduced cubic regularization methods

Journal Article Journal of Machine Learning Research · August 1, 2019 We propose a stochastic variance-reduced cubic regularized Newton method (SVRC) for non-convex optimization. At the core of SVRC is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic regularizat ... Cite

Stochastic gradient hamiltonian monte carlo methods with recursive variance reduction

Conference Advances in Neural Information Processing Systems · January 1, 2019 Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) algorithms have received increasing attention in both theory and practice. In this paper, we propose a Stochastic Recursive Variance-Reduced gradient HMC (SRVR-HMC) algorithm. It makes use of a semi-stoch ... Cite

An improved convergence analysis of stochastic variance-reduced policy gradient

Conference 35th Conference on Uncertainty in Artificial Intelligence, UAI 2019 · January 1, 2019 We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by Papini et al. (2018) for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an ∊-approximate stationary point of the p ... Cite

An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

Conference Proceedings of Machine Learning Research · January 1, 2019 We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by Papini et al. (2018) for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an ε-approximate stationary point of the p ... Cite

Third-order smoothness helps: Faster stochastic optimization algorithms for finding local minima

Conference Advances in Neural Information Processing Systems · January 1, 2018 We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently. More specificall ... Cite

Stochastic nested variance reduction for nonconvex optimization

Conference Advances in Neural Information Processing Systems · January 1, 2018 We study finite-sum nonconvex optimization problems, where the objective function is an average of n nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with conventional stochastic varia ... Cite

Global convergence of Langevin dynamics based algorithms for nonconvex optimization

Conference Advances in Neural Information Processing Systems · January 1, 2018 We present a unified framework to analyze the global convergence of Langevin dynamics based algorithms for nonconvex finite-sum optimization with n component functions. At the core of our analysis is a direct analysis of the ergodicity of the numerical app ... Cite

Subsampled stochastic variance-reduced gradient langevin dynamics

Conference 34th Conference on Uncertainty in Artificial Intelligence 2018, UAI 2018 · January 1, 2018 Stochastic variance-reduced gradient Langevin dynamics (SVRG-LD) was recently proposed to improve the performance of stochastic gradient Langevin dynamics (SGLD) by reducing the variance of the stochastic gradient. In this paper, we propose a variant of SV ... Cite

Accelerated stochastic mirror descent: From continuous-time dynamics to discrete-time algorithms

Conference International Conference on Artificial Intelligence and Statistics, AISTATS 2018 · January 1, 2018 We present a new framework to analyze accelerated stochastic mirror descent through the lens of continuous-time stochastic dynamic systems. It enables us to design new algorithms, and perform a unified and simple analysis of the convergence rates of these ... Cite

Continuous and discrete-time accelerated stochastic mirror descent for strongly convex functions

Conference 35th International Conference on Machine Learning, ICML 2018 · January 1, 2018 We provide a second-order stochastic differential equation (SDE), which characterizes the continuous-time dynamics of accelerated stochastic mirror descent (ASMD) for strongly convex functions. This SDE plays a central role in designing new discrete-time A ... Cite

Stochastic variance-reduced cubic regularized Newton method

Conference 35th International Conference on Machine Learning, ICML 2018 · January 1, 2018 We propose a stochastic variance-reduced cubic regularized Newton method (SVRC) for non-convex optimization. At the core of our algorithm is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic re ... Cite

Covariate adjusted precision matrix estimation via nonconvex optimization

Conference 35th International Conference on Machine Learning, ICML 2018 · January 1, 2018 We propose a nonconvex estimator for the covariate adjusted precision matrix estimation problem in the high dimensional regime, under sparsity constraints. To solve this estimator, we propose an alternating gradient descent algorithm with hard thresholding ... Cite

Stochastic variance-reduced Hamilton Monte Carlo methods

Conference 35th International Conference on Machine Learning, ICML 2018 · January 1, 2018 We propose a fast stochastic Hamilton Monte Carlo (HMC) method, for sampling from a smooth and strongly log-concave distribution. At the core of our proposed method is a variance reduction technique inspired by the recent advance in stochastic optimization ... Cite

Efficient algorithm for sparse tensor-variate gaussian graphical models via gradient descent

Conference Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017 · January 1, 2017 We study the sparse tensor-variate Gaussian graphical model (STGGM), where each way of the tensor follows a multivariate normal distribution whose precision matrix has sparse structures. In order to estimate the precision matrices, we propose a sparsity co ... Cite

Uncertainty assessment and false discovery rate control in high-dimensional Granger causal inference

Conference 34th International Conference on Machine Learning, ICML 2017 · January 1, 2017 Causal inference among high-dimensional time series data proves an important research problem in many fields. While in the classical regime one often establishes causality among time series via a concept known as "Granger causality, " existing approaches f ... Cite

Speeding up latent variable Gaussian graphical model estimation via nonconvex optimization

Conference Advances in Neural Information Processing Systems · January 1, 2017 We study the estimation of the latent variable Gaussian graphical model (LVGGM), where the precision matrix is the superposition of a sparse matrix and a low-rank matrix. In order to speed up the estimation of the sparse plus low-rank components, we propos ... Cite

Semiparametric differential graph models

Conference Advances in Neural Information Processing Systems · January 1, 2016 In many cases of network analysis, it is more attractive to study how a network varies under different conditions than an individual static network. We propose a novel graphical model, namely Latent Differential Graph Model, where the networks under two di ... Cite

Forward backward greedy algorithms for multi-task learning with faster rates

Conference 32nd Conference on Uncertainty in Artificial Intelligence 2016, UAI 2016 · January 1, 2016 A large body of algorithms have been proposed for multi-task learning. However, the effectiveness of many multi-task learning algorithms highly depends on the structural regularization, which incurs bias in the resulting estimators and leads to slower conv ... Cite

Neural Contextual Bandits with Deep Representation and Shallow Exploration

Conference We study neural contextual bandits, a general class of contextual bandits, where each context-action pair is associated with a raw feature vector, but the specific reward generating function is unknown. We propose a novel learning algorithm that transforms ... Link to item Cite

Double Explore-then-Commit: Asymptotic Optimality and Beyond

Conference Proceedings of Thirty Fourth Conference on Learning Theory We study the multi-armed bandit problem with subGaussian rewards. The explore-then-commit (ETC) strategy, which consists of an exploration phase followed by an exploitation phase, is one of the most widely used algorithms in a variety of online decision ap ... Link to item Cite