Pan Xu

Journal Article AI Magazine · September 1, 2024 Sequential decision-making involves making informed decisions based on continuous interactions with a complex environment. This process is ubiquitous in various applications, including recommendation systems and clinical treatment design. My research has c ... Full text Cite

Pre-trained Language Models Improve the Few-shot Prompt Ability of Decision Transformer

Preprint · August 2, 2024 Link to item Cite

Optimal Batched Linear Bandits

Preprint · June 6, 2024 Link to item Cite

Challenges of COVID-19 Case Forecasting in the US, 2020-2021.

Journal Article PLoS Comput Biol · May 2024 During the COVID-19 pandemic, forecasting COVID-19 trends to support planning and response was a priority for scientists and decision makers alike. In the United States, COVID-19 forecasting was coordinated by a large group of universities, companies, and ... Full text Open Access Link to item Cite

Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

Preprint · April 16, 2024 Link to item Cite

Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs

Conference Proceedings of the Aaai Conference on Artificial Intelligence · March 25, 2024 We study the multi-agent multi-armed bandit (MAMAB) problem, where agents are factored into overlapping groups. Each group represents a hyperedge, forming a hypergraph over the agents. At each round of interaction, the learner pulls a joint arm (composed o ... Full text Cite

Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning

Preprint · March 14, 2024 Link to item Cite

Distributionally Robust Off-Dynamics Reinforcement Learning: Provable Efficiency with Linear Function Approximation

Preprint · February 23, 2024 Link to item Cite

Distributionally Robust Off-Dynamics Reinforcement Learning: Provable Efficiency with Linear Function Approximation

Conference Proceedings of Machine Learning Research · January 1, 2024 We study off-dynamics Reinforcement Learning (RL), where the policy is trained on a source domain and deployed to a distinct target domain. We aim to solve this problem via online distributionally robust Markov decision processes (DRMDPs), where the learni ... Cite

PROVABLE AND PRACTICAL: EFFICIENT EXPLORATION IN REINFORCEMENT LEARNING VIA LANGEVIN MONTE CARLO

Conference 12th International Conference on Learning Representations Iclr 2024 · January 1, 2024 We present a scalable and effective exploration strategy based on Thompson sampling for reinforcement learning (RL). One of the key shortcomings of existing Thompson sampling algorithms is the need to perform a Gaussian approximation of the posterior distr ... Cite

Optimal Batched Linear Bandits

Conference Proceedings of Machine Learning Research · January 1, 2024 We introduce the E4 algorithm for the batched linear bandit problem, incorporating an Explore-Estimate-Eliminate-Exploit framework. With a proper choice of exploration rate, we prove E4 achieves the finite-time minimax optimal regret ... Cite

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

Journal Article Transactions on Machine Learning Research · January 1, 2024 Off-policy evaluation and learning are concerned with assessing a given policy and learning an optimal policy from offline data without direct interaction with the environment. Often, the environment in which the data are collected differs from the environ ... Open Access Cite

Pre-trained Hypergraph Convolutional Neural Networks with Self-supervised Learning

Journal Article Transactions on Machine Learning Research · January 1, 2024 Hypergraphs are powerful tools for modeling complex interactions across various domains, including biomedicine. However, learning meaningful node representations from hypergraphs remains a challenge. Existing supervised methods often lack generalizability, ... Open Access Cite

Off-Dynamics Reinforcement Learning via Domain Adaptation and Reward Augmented Imitation

Conference Advances in Neural Information Processing Systems · January 1, 2024 Training a policy in a source domain for deployment in the target domain under a dynamics shift can be challenging, often resulting in performance degradation. Previous work tackles this challenge by training on the source domain with modified rewards deri ... Cite

Minimax Optimal and Computationally Efficient Algorithms for Distributionally Robust Offline Reinforcement Learning

Conference Advances in Neural Information Processing Systems · January 1, 2024 Distributionally robust offline reinforcement learning (RL), which seeks robust policy training against environment perturbation by modeling dynamics uncertainty, calls for function approximations when facing large state-action spaces. However, the conside ... Cite

Randomized Exploration in Cooperative Multi-Agent Reinforcement Learning

Conference Advances in Neural Information Processing Systems · January 1, 2024 We present the first study on provably efficient randomized exploration in cooperative multi-agent reinforcement learning (MARL). We propose a unified algorithm framework for randomized exploration in parallel Markov Decision Processes (MDPs), and two Thom ... Cite

Optimal Batched Best Arm Identification

Conference Advances in Neural Information Processing Systems · January 1, 2024 We study the batched best arm identification (BBAI) problem, where the learner's goal is to identify the best arm while switching the policy as less as possible. In particular, we aim to find the best arm with probability 1−δ for some small constant δ > 0 ... Cite

Finite-Time Frequentist Regret Bounds of Multi-Agent Thompson Sampling on Sparse Hypergraphs

Preprint · December 24, 2023 Open Access Link to item Cite

Convergence of Sign-based Random Reshuffling Algorithms for Nonconvex Optimization

Preprint · October 24, 2023 Link to item Cite

Optimal Batched Best Arm Identification

Preprint · October 21, 2023 Link to item Cite

Wasserstein Distributionally Robust Policy Evaluation and Learning for Contextual Bandits

Preprint · September 15, 2023 Link to item Cite

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

Conference Sigmetrics 2023 Abstract Proceedings of the 2023 ACM Sigmetrics International Conference on Measurement and Modeling of Computer Systems · June 19, 2023 We study a multi-Agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-Term rewards. To overcome the curse of dimensiona ... Full text Cite

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

Journal Article Performance Evaluation Review · June 19, 2023 We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensiona ... Full text Open Access Cite

Queer In AI: A Case Study in Community-Led Participatory AI

Conference ACM International Conference Proceeding Series · June 12, 2023 Queerness and queer people face an uncertain future in the face of ever more widely deployed and invasive artificial intelligence (AI). These technologies have caused numerous harms to queer people, including privacy violations, censoring and downranking q ... Full text Cite

Multiple models for outbreak decision support in the face of uncertainty.

Journal Article Proc Natl Acad Sci U S A · May 2, 2023 Policymakers must make management decisions despite incomplete knowledge and conflicting model projections. Little guidance exists for the rapid, representative, and unbiased collection of policy-relevant scientific input from independent modeling teams. I ... Full text Open Access Link to item Cite

Global Convergence of Localized Policy Iteration in Networked Multi-Agent Reinforcement Learning

Journal Article Proceedings of the ACM on Measurement and Analysis of Computing Systems · February 28, 2023 We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensiona ... Full text Open Access Cite

Distributionally Robust Policy Gradient for Offline Contextual Bandits

Conference Proceedings of Machine Learning Research · January 1, 2023 Learning an optimal policy from offline data is notoriously challenging, which requires the evaluation of the learning policy using data pre-collected from a static logging policy. We study the policy optimization problem in offline contextual bandits usin ... Cite

Thompson Sampling with Less Exploration is Fast and Optimal

Conference Proceedings of Machine Learning Research · January 1, 2023 We propose ϵ-Exploring Thompson Sampling (ϵTS), a modified version of the Thompson Sampling (TS) algorithm (Agrawal & Goyal, 2017) for multi-armed bandits. In ϵ-TS, arms are selected greedily based on empirical mean rewards with probability 1 − ϵ, and base ... Cite

The United States COVID-19 Forecast Hub dataset.

Journal Article Sci Data · August 1, 2022 Academic researchers, government agencies, industry groups, and individuals have produced forecasts at an unprecedented scale during the COVID-19 pandemic. To leverage these forecasts, the United States Centers for Disease Control and Prevention (CDC) part ... Full text Open Access Link to item Cite

Langevin Monte Carlo for Contextual Bandits

Conference · June 22, 2022 We study the efficiency of Thompson sampling for contextual bandits. Existing Thompson sampling-based algorithms need to construct a Laplace approximation (i.e., a Gaussian distribution) of the posterior distribution, which is inefficient to sample in high ... Link to item Cite

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

Preprint · June 7, 2022 Link to item Cite

Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States.

Journal Article Proc Natl Acad Sci U S A · April 12, 2022 Short-term probabilistic forecasts of the trajectory of the COVID-19 pandemic in the United States have served as a visible and important communication channel between the scientific modeling community and both the general public and decision-makers. Forec ... Full text Open Access Link to item Cite

Finite-Time Regret of Thompson Sampling Algorithms for Exponential Family Multi-Armed Bandits

Conference Advances in Neural Information Processing Systems · January 1, 2022 We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family, which covers many common reward distributions including Bernoulli, Gaussian, Gamma, Exponen ... Cite

Active Ranking without Strong Stochastic Transitivity

Conference Advances in Neural Information Processing Systems · January 1, 2022 Ranking from noisy comparisons is of great practical interest in machine learning. In this paper, we consider the problem of recovering the exact full ranking for a list of items under ranking models that do not assume the Strong Stochastic Transitivity pr ... Cite

Langevin Monte Carlo for Contextual Bandits

Conference Proceedings of Machine Learning Research · January 1, 2022 We study the efficiency of Thompson sampling for contextual bandits. Existing Thompson sampling-based algorithms need to construct a Laplace approximation (i.e., a Gaussian distribution) of the posterior distribution, which is inefficient to sample in high ... Cite

Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons

Conference Proceedings of Machine Learning Research · January 1, 2022 In heterogeneous rank aggregation problems, users often exhibit various accuracy levels when comparing pairs of items. Thus, a uniform querying strategy over users may not be optimal. To address this issue, we propose an elimination-based active sampling s ... Cite

Adaptive Sampling for Heterogeneous Rank Aggregation from Noisy Pairwise Comparisons

Preprint · October 8, 2021 Link to item Cite

Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling

Conference 37th Conference on Uncertainty in Artificial Intelligence Uai 2021 · January 1, 2021 We provide a new convergence analysis of stochastic gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-revers ... Cite

The United States COVID-19 Forecast Hub dataset

Preprint · 2021 Full text Cite

Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the US

Preprint · 2021 Full text Cite

Almost Optimal Anytime Algorithm for Batched Multi-Armed Bandits

Conference Proceedings of Machine Learning Research · January 1, 2021 In batched multi-armed bandit problems, the learner can adaptively pull arms and adjust strategy in batches. In many real applications, not only the regret but also the batch complexity need to be optimized. Existing batched bandit algorithms usually assum ... Cite

MOTS: Minimax Optimal Thompson Sampling

Conference Proceedings of Machine Learning Research · January 1, 2021 Thompson sampling is one of the most widely used algorithms for many online decision problems, due to its simplicity in implementation and superior empirical performance over other state-of-the-art methods. Despite its popularity and empirical success, it ... Cite

Faster Convergence of Stochastic Gradient Langevin Dynamics for Non-Log-Concave Sampling

Conference Proceedings of Machine Learning Research · January 1, 2021 We provide a new convergence analysis of stochastic gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-revers ... Cite

Neural Contextual Bandits with Deep Representation and Shallow Exploration

Preprint · December 3, 2020 Link to item Cite

COVID-19 reopening strategies at the county level in the face of uncertainty: Multiple Models for Outbreak Decision Support

Preprint · November 5, 2020 Full text Cite

A Finite Time Analysis of Two Time-Scale Actor Critic Methods

Preprint · May 4, 2020 Link to item Cite

Stochastic nested variance reduction for nonconvex optimization

Journal Article Journal of Machine Learning Research · May 1, 2020 We study nonconvex optimization problems, where the objective function is either an average of n nonconvex functions or the expectation of some stochastic function. We propose a new stochastic gradient descent algorithm based on nested variance reduction, ... Open Access Cite

Rank aggregation via heterogeneous thurstone preference models

Conference Aaai 2020 34th Aaai Conference on Artificial Intelligence · January 1, 2020 We propose the Heterogeneous Thurstone Model (HTM) for aggregating ranked data, which can take the accuracy levels of different users into account. By allowing different noise distributions, the proposed HTM model maintains the generality of Thurstone's or ... Cite

A finite-time analysis of Q-Learning with neural network function approximation

Conference 37th International Conference on Machine Learning Icml 2020 · January 1, 2020 Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtuall ... Cite

A finite-time analysis of two time-scale actor-critic methods

Conference Advances in Neural Information Processing Systems · January 1, 2020 Actor-critic (AC) methods have exhibited great empirical success compared with other reinforcement learning algorithms, where the actor uses the policy gradient to improve the learning policy and the critic uses temporal difference learning to estimate the ... Cite

Ensemble Forecasts of Coronavirus Disease 2019 (COVID-19) in the U.S.

Preprint · 2020 Full text Cite

Epidemic Model Guided Machine Learning for COVID-19 Forecasts in the United States

Preprint · 2020 Full text Cite

SAMPLE EFFICIENT POLICY GRADIENT METHODS WITH RECURSIVE VARIANCE REDUCTION

Conference 8th International Conference on Learning Representations Iclr 2020 · January 1, 2020 Improving the sample efficiency in reinforcement learning has been a longstanding research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which ... Cite

Rank Aggregation via Heterogeneous Thurstone Preference Models

Preprint · December 3, 2019 Link to item Cite

Sample Efficient Policy Gradient Methods with Recursive Variance Reduction

Preprint · September 18, 2019 Link to item Cite

Stochastic variance-reduced cubic regularization methods

Journal Article Journal of Machine Learning Research · August 1, 2019 We propose a stochastic variance-reduced cubic regularized Newton method (SVRC) for non-convex optimization. At the core of SVRC is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic regularizat ... Open Access Cite

An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

Preprint · May 29, 2019 Link to item Cite

Stochastic gradient hamiltonian monte carlo methods with recursive variance reduction

Conference Advances in Neural Information Processing Systems · January 1, 2019 Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) algorithms have received increasing attention in both theory and practice. In this paper, we propose a Stochastic Recursive Variance-Reduced gradient HMC (SRVR-HMC) algorithm. It makes use of a semi-stoch ... Cite

An improved convergence analysis of stochastic variance-reduced policy gradient

Conference 35th Conference on Uncertainty in Artificial Intelligence Uai 2019 · January 1, 2019 We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by Papini et al. (2018) for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an ∊-approximate stationary point of the p ... Cite

Sampling from non-log-concave distributions via stochastic variance-reduced gradient Langevin dynamics

Conference Aistats 2019 22nd International Conference on Artificial Intelligence and Statistics · January 1, 2019 We study stochastic variance reduction-based Langevin dynamic algorithms, SVRG-LD and SAGA-LD (Dubey et al., 2016), for sampling from non-log-concave distributions. Under certain assumptions on the log density function, we establish the convergence guarant ... Cite

An Improved Convergence Analysis of Stochastic Variance-Reduced Policy Gradient

Conference Proceedings of Machine Learning Research · January 1, 2019 We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by Papini et al. (2018) for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an ε-approximate stationary point of the p ... Cite

Finding Local Minima via Stochastic Nested Variance Reduction

Preprint · June 22, 2018 Link to item Cite

Stochastic Nested Variance Reduction for Nonconvex Optimization

Preprint · June 20, 2018 Link to item Cite

Stochastic Variance-Reduced Hamilton Monte Carlo Methods

Preprint · February 13, 2018 Link to item Cite

Stochastic Variance-Reduced Cubic Regularized Newton Method

Preprint · February 13, 2018 Link to item Cite

Third-order smoothness helps: Faster stochastic optimization algorithms for finding local minima

Conference Advances in Neural Information Processing Systems · January 1, 2018 We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently. More specificall ... Cite

Stochastic nested variance reduction for nonconvex optimization

Conference Advances in Neural Information Processing Systems · January 1, 2018 We study finite-sum nonconvex optimization problems, where the objective function is an average of n nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with conventional stochastic varia ... Cite

Global convergence of Langevin dynamics based algorithms for nonconvex optimization

Conference Advances in Neural Information Processing Systems · January 1, 2018 We present a unified framework to analyze the global convergence of Langevin dynamics based algorithms for nonconvex finite-sum optimization with n component functions. At the core of our analysis is a direct analysis of the ergodicity of the numerical app ... Cite

Subsampled stochastic variance-reduced gradient langevin dynamics

Conference 34th Conference on Uncertainty in Artificial Intelligence 2018 Uai 2018 · January 1, 2018 Stochastic variance-reduced gradient Langevin dynamics (SVRG-LD) was recently proposed to improve the performance of stochastic gradient Langevin dynamics (SGLD) by reducing the variance of the stochastic gradient. In this paper, we propose a variant of SV ... Cite

Accelerated stochastic mirror descent: From continuous-time dynamics to discrete-time algorithms

Conference International Conference on Artificial Intelligence and Statistics Aistats 2018 · January 1, 2018 We present a new framework to analyze accelerated stochastic mirror descent through the lens of continuous-time stochastic dynamic systems. It enables us to design new algorithms, and perform a unified and simple analysis of the convergence rates of these ... Cite

Continuous and discrete-time accelerated stochastic mirror descent for strongly convex functions

Conference 35th International Conference on Machine Learning Icml 2018 · January 1, 2018 We provide a second-order stochastic differential equation (SDE), which characterizes the continuous-time dynamics of accelerated stochastic mirror descent (ASMD) for strongly convex functions. This SDE plays a central role in designing new discrete-time A ... Cite

Stochastic variance-reduced cubic regularized Newton method

Conference 35th International Conference on Machine Learning Icml 2018 · January 1, 2018 We propose a stochastic variance-reduced cubic regularized Newton method (SVRC) for non-convex optimization. At the core of our algorithm is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic re ... Cite

Covariate adjusted precision matrix estimation via nonconvex optimization

Conference 35th International Conference on Machine Learning Icml 2018 · January 1, 2018 We propose a nonconvex estimator for the covariate adjusted precision matrix estimation problem in the high dimensional regime, under sparsity constraints. To solve this estimator, we propose an alternating gradient descent algorithm with hard thresholding ... Cite

Stochastic variance-reduced Hamilton Monte Carlo methods

Conference 35th International Conference on Machine Learning Icml 2018 · January 1, 2018 We propose a fast stochastic Hamilton Monte Carlo (HMC) method, for sampling from a smooth and strongly log-concave distribution. At the core of our proposed method is a variance reduction technique inspired by the recent advance in stochastic optimization ... Cite

Third-order Smoothness Helps: Even Faster Stochastic Optimization Algorithms for Finding Local Minima

Preprint · December 18, 2017 Link to item Cite

Global Convergence of Langevin Dynamics Based Algorithms for Nonconvex Optimization

Preprint · July 20, 2017 Link to item Cite

Speeding Up Latent Variable Gaussian Graphical Model Estimation via Nonconvex Optimizations

Preprint · February 28, 2017 Link to item Cite

Efficient algorithm for sparse tensor-variate gaussian graphical models via gradient descent

Conference Proceedings of the 20th International Conference on Artificial Intelligence and Statistics Aistats 2017 · January 1, 2017 We study the sparse tensor-variate Gaussian graphical model (STGGM), where each way of the tensor follows a multivariate normal distribution whose precision matrix has sparse structures. In order to estimate the precision matrices, we propose a sparsity co ... Cite

Uncertainty assessment and false discovery rate control in high-dimensional Granger causal inference

Conference 34th International Conference on Machine Learning Icml 2017 · January 1, 2017 Causal inference among high-dimensional time series data proves an important research problem in many fields. While in the classical regime one often establishes causality among time series via a concept known as "Granger causality, " existing approaches f ... Cite

Speeding up latent variable Gaussian graphical model estimation via nonconvex optimization

Conference Advances in Neural Information Processing Systems · January 1, 2017 We study the estimation of the latent variable Gaussian graphical model (LVGGM), where the precision matrix is the superposition of a sparse matrix and a low-rank matrix. In order to speed up the estimation of the sparse plus low-rank components, we propos ... Cite

Communication-efficient Distributed Estimation and Inference for Transelliptical Graphical Models

Preprint · December 29, 2016 Link to item Cite

Semiparametric differential graph models

Conference Advances in Neural Information Processing Systems · January 1, 2016 In many cases of network analysis, it is more attractive to study how a network varies under different conditions than an individual static network. We propose a novel graphical model, namely Latent Differential Graph Model, where the networks under two di ... Cite

Forward backward greedy algorithms for multi-task learning with faster rates

Conference 32nd Conference on Uncertainty in Artificial Intelligence 2016 Uai 2016 · January 1, 2016 A large body of algorithms have been proposed for multi-task learning. However, the effectiveness of many multi-task learning algorithms highly depends on the structural regularization, which incurs bias in the resulting estimators and leads to slower conv ... Cite

Neural Contextual Bandits with Deep Representation and Shallow Exploration

Conference We study neural contextual bandits, a general class of contextual bandits, where each context-action pair is associated with a raw feature vector, but the specific reward generating function is unknown. We propose a novel learning algorithm that transforms ... Link to item Cite

Double Explore-then-Commit: Asymptotic Optimality and Beyond

Conference Proceedings of Thirty Fourth Conference on Learning Theory We study the multi-armed bandit problem with subGaussian rewards. The explore-then-commit (ETC) strategy, which consists of an exploration phase followed by an exploitation phase, is one of the most widely used algorithms in a variety of online decision ap ... Link to item Cite