Journal ArticleAI Magazine · September 1, 2024
Sequential decision-making involves making informed decisions based on continuous interactions with a complex environment. This process is ubiquitous in various applications, including recommendation systems and clinical treatment design. My research has c ...
Full textCite
Journal ArticlePLoS Comput Biol · May 2024
During the COVID-19 pandemic, forecasting COVID-19 trends to support planning and response was a priority for scientists and decision makers alike. In the United States, COVID-19 forecasting was coordinated by a large group of universities, companies, and ...
Full textOpen AccessLink to itemCite
ConferenceProceedings of the AAAI Conference on Artificial Intelligence · March 25, 2024
We study the multi-agent multi-armed bandit (MAMAB) problem, where agents are factored into overlapping groups. Each group represents a hyperedge, forming a hypergraph over the agents. At each round of interaction, the learner pulls a joint arm (composed o ...
Full textCite
ConferenceProceedings of Machine Learning Research · January 1, 2024
We study off-dynamics Reinforcement Learning (RL), where the policy is trained on a source domain and deployed to a distinct target domain. We aim to solve this problem via online distributionally robust Markov decision processes (DRMDPs), where the learni ...
Cite
ConferenceSIGMETRICS 2023 - Abstract Proceedings of the 2023 ACM SIGMETRICS International Conference on Measurement and Modeling of Computer Systems · June 19, 2023
We study a multi-Agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-Term rewards. To overcome the curse of dimensiona ...
Full textCite
ConferenceACM International Conference Proceeding Series · June 12, 2023
Queerness and queer people face an uncertain future in the face of ever more widely deployed and invasive artificial intelligence (AI). These technologies have caused numerous harms to queer people, including privacy violations, censoring and downranking q ...
Full textCite
Journal ArticleProc Natl Acad Sci U S A · May 2, 2023
Policymakers must make management decisions despite incomplete knowledge and conflicting model projections. Little guidance exists for the rapid, representative, and unbiased collection of policy-relevant scientific input from independent modeling teams. I ...
Full textOpen AccessLink to itemCite
Journal ArticleProceedings of the ACM on Measurement and Analysis of Computing Systems · February 28, 2023
We study a multi-agent reinforcement learning (MARL) problem where the agents interact over a given network. The goal of the agents is to cooperatively maximize the average of their entropy-regularized long-term rewards. To overcome the curse of dimensiona ...
Full textOpen AccessCite
ConferenceProceedings of Machine Learning Research · January 1, 2023
Learning an optimal policy from offline data is notoriously challenging, which requires the evaluation of the learning policy using data pre-collected from a static logging policy. We study the policy optimization problem in offline contextual bandits usin ...
Cite
ConferenceProceedings of Machine Learning Research · January 1, 2023
We propose ϵ-Exploring Thompson Sampling (ϵTS), a modified version of the Thompson Sampling (TS) algorithm (Agrawal & Goyal, 2017) for multi-armed bandits. In ϵ-TS, arms are selected greedily based on empirical mean rewards with probability 1 − ϵ, and base ...
Cite
Journal ArticleSci Data · August 1, 2022
Academic researchers, government agencies, industry groups, and individuals have produced forecasts at an unprecedented scale during the COVID-19 pandemic. To leverage these forecasts, the United States Centers for Disease Control and Prevention (CDC) part ...
Full textOpen AccessLink to itemCite
Journal ArticleProc Natl Acad Sci U S A · April 12, 2022
Short-term probabilistic forecasts of the trajectory of the COVID-19 pandemic in the United States have served as a visible and important communication channel between the scientific modeling community and both the general public and decision-makers. Forec ...
Full textOpen AccessLink to itemCite
ConferenceAdvances in Neural Information Processing Systems · January 1, 2022
We study the regret of Thompson sampling (TS) algorithms for exponential family bandits, where the reward distribution is from a one-dimensional exponential family, which covers many common reward distributions including Bernoulli, Gaussian, Gamma, Exponen ...
Cite
ConferenceAdvances in Neural Information Processing Systems · January 1, 2022
Ranking from noisy comparisons is of great practical interest in machine learning. In this paper, we consider the problem of recovering the exact full ranking for a list of items under ranking models that do not assume the Strong Stochastic Transitivity pr ...
Cite
ConferenceProceedings of Machine Learning Research · January 1, 2022
We study the efficiency of Thompson sampling for contextual bandits. Existing Thompson sampling-based algorithms need to construct a Laplace approximation (i.e., a Gaussian distribution) of the posterior distribution, which is inefficient to sample in high ...
Cite
ConferenceProceedings of Machine Learning Research · January 1, 2022
In heterogeneous rank aggregation problems, users often exhibit various accuracy levels when comparing pairs of items. Thus, a uniform querying strategy over users may not be optimal. To address this issue, we propose an elimination-based active sampling s ...
Cite
ConferenceProceedings of Machine Learning Research · January 1, 2021
In batched multi-armed bandit problems, the learner can adaptively pull arms and adjust strategy in batches. In many real applications, not only the regret but also the batch complexity need to be optimized. Existing batched bandit algorithms usually assum ...
Cite
ConferenceProceedings of Machine Learning Research · January 1, 2021
Thompson sampling is one of the most widely used algorithms for many online decision problems, due to its simplicity in implementation and superior empirical performance over other state-of-the-art methods. Despite its popularity and empirical success, it ...
Cite
ConferenceProceedings of Machine Learning Research · January 1, 2021
We provide a new convergence analysis of stochastic gradient Langevin dynamics (SGLD) for sampling from a class of distributions that can be non-log-concave. At the core of our approach is a novel conductance analysis of SGLD using an auxiliary time-revers ...
Cite
Journal ArticleJournal of Machine Learning Research · May 1, 2020
We study nonconvex optimization problems, where the objective function is either an average of n nonconvex functions or the expectation of some stochastic function. We propose a new stochastic gradient descent algorithm based on nested variance reduction, ...
Open AccessCite
ConferenceAAAI 2020 - 34th AAAI Conference on Artificial Intelligence · January 1, 2020
We propose the Heterogeneous Thurstone Model (HTM) for aggregating ranked data, which can take the accuracy levels of different users into account. By allowing different noise distributions, the proposed HTM model maintains the generality of Thurstone's or ...
Cite
Conference37th International Conference on Machine Learning, ICML 2020 · January 1, 2020
Q-learning with neural network function approximation (neural Q-learning for short) is among the most prevalent deep reinforcement learning algorithms. Despite its empirical success, the non-asymptotic convergence rate of neural Q-learning remains virtuall ...
Cite
ConferenceAdvances in Neural Information Processing Systems · January 1, 2020
Actor-critic (AC) methods have exhibited great empirical success compared with other reinforcement learning algorithms, where the actor uses the policy gradient to improve the learning policy and the critic uses temporal difference learning to estimate the ...
Cite
ConferenceAISTATS 2019 - 22nd International Conference on Artificial Intelligence and Statistics · January 1, 2020
We study stochastic variance reduction-based Langevin dynamic algorithms, SVRG-LD and SAGA-LD (Dubey et al., 2016), for sampling from non-log-concave distributions. Under certain assumptions on the log density function, we establish the convergence guarant ...
Cite
Conference8th International Conference on Learning Representations, ICLR 2020 · January 1, 2020
Improving the sample efficiency in reinforcement learning has been a longstanding research problem. In this work, we aim to reduce the sample complexity of existing policy gradient methods. We propose a novel policy gradient algorithm called SRVR-PG, which ...
Cite
Journal ArticleJournal of Machine Learning Research · August 1, 2019
We propose a stochastic variance-reduced cubic regularized Newton method (SVRC) for non-convex optimization. At the core of SVRC is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic regularizat ...
Open AccessCite
ConferenceAdvances in Neural Information Processing Systems · January 1, 2019
Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) algorithms have received increasing attention in both theory and practice. In this paper, we propose a Stochastic Recursive Variance-Reduced gradient HMC (SRVR-HMC) algorithm. It makes use of a semi-stoch ...
Cite
Conference35th Conference on Uncertainty in Artificial Intelligence, UAI 2019 · January 1, 2019
We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by Papini et al. (2018) for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an ∊-approximate stationary point of the p ...
Cite
ConferenceProceedings of Machine Learning Research · January 1, 2019
We revisit the stochastic variance-reduced policy gradient (SVRPG) method proposed by Papini et al. (2018) for reinforcement learning. We provide an improved convergence analysis of SVRPG and show that it can find an ε-approximate stationary point of the p ...
Cite
ConferenceAdvances in Neural Information Processing Systems · January 1, 2018
We propose stochastic optimization algorithms that can find local minima faster than existing algorithms for nonconvex optimization problems, by exploiting the third-order smoothness to escape non-degenerate saddle points more efficiently. More specificall ...
Cite
ConferenceAdvances in Neural Information Processing Systems · January 1, 2018
We study finite-sum nonconvex optimization problems, where the objective function is an average of n nonconvex functions. We propose a new stochastic gradient descent algorithm based on nested variance reduction. Compared with conventional stochastic varia ...
Cite
ConferenceAdvances in Neural Information Processing Systems · January 1, 2018
We present a unified framework to analyze the global convergence of Langevin dynamics based algorithms for nonconvex finite-sum optimization with n component functions. At the core of our analysis is a direct analysis of the ergodicity of the numerical app ...
Cite
Conference34th Conference on Uncertainty in Artificial Intelligence 2018, UAI 2018 · January 1, 2018
Stochastic variance-reduced gradient Langevin dynamics (SVRG-LD) was recently proposed to improve the performance of stochastic gradient Langevin dynamics (SGLD) by reducing the variance of the stochastic gradient. In this paper, we propose a variant of SV ...
Cite
ConferenceInternational Conference on Artificial Intelligence and Statistics, AISTATS 2018 · January 1, 2018
We present a new framework to analyze accelerated stochastic mirror descent through the lens of continuous-time stochastic dynamic systems. It enables us to design new algorithms, and perform a unified and simple analysis of the convergence rates of these ...
Cite
Conference35th International Conference on Machine Learning, ICML 2018 · January 1, 2018
We provide a second-order stochastic differential equation (SDE), which characterizes the continuous-time dynamics of accelerated stochastic mirror descent (ASMD) for strongly convex functions. This SDE plays a central role in designing new discrete-time A ...
Cite
Conference35th International Conference on Machine Learning, ICML 2018 · January 1, 2018
We propose a stochastic variance-reduced cubic regularized Newton method (SVRC) for non-convex optimization. At the core of our algorithm is a novel semi-stochastic gradient along with a semi-stochastic Hessian, which are specifically designed for cubic re ...
Cite
Conference35th International Conference on Machine Learning, ICML 2018 · January 1, 2018
We propose a nonconvex estimator for the covariate adjusted precision matrix estimation problem in the high dimensional regime, under sparsity constraints. To solve this estimator, we propose an alternating gradient descent algorithm with hard thresholding ...
Cite
Conference35th International Conference on Machine Learning, ICML 2018 · January 1, 2018
We propose a fast stochastic Hamilton Monte Carlo (HMC) method, for sampling from a smooth and strongly log-concave distribution. At the core of our proposed method is a variance reduction technique inspired by the recent advance in stochastic optimization ...
Cite
ConferenceProceedings of the 20th International Conference on Artificial Intelligence and Statistics, AISTATS 2017 · January 1, 2017
We study the sparse tensor-variate Gaussian graphical model (STGGM), where each way of the tensor follows a multivariate normal distribution whose precision matrix has sparse structures. In order to estimate the precision matrices, we propose a sparsity co ...
Cite
Conference34th International Conference on Machine Learning, ICML 2017 · January 1, 2017
Causal inference among high-dimensional time series data proves an important research problem in many fields. While in the classical regime one often establishes causality among time series via a concept known as "Granger causality, " existing approaches f ...
Cite
ConferenceAdvances in Neural Information Processing Systems · January 1, 2017
We study the estimation of the latent variable Gaussian graphical model (LVGGM), where the precision matrix is the superposition of a sparse matrix and a low-rank matrix. In order to speed up the estimation of the sparse plus low-rank components, we propos ...
Cite
ConferenceAdvances in Neural Information Processing Systems · January 1, 2016
In many cases of network analysis, it is more attractive to study how a network varies under different conditions than an individual static network. We propose a novel graphical model, namely Latent Differential Graph Model, where the networks under two di ...
Cite
Conference32nd Conference on Uncertainty in Artificial Intelligence 2016, UAI 2016 · January 1, 2016
A large body of algorithms have been proposed for multi-task learning. However, the effectiveness of many multi-task learning algorithms highly depends on the structural regularization, which incurs bias in the resulting estimators and leads to slower conv ...
Cite
Conference
We study neural contextual bandits, a general class of contextual bandits, where each context-action pair is associated with a raw feature vector, but the specific reward generating function is unknown. We propose a novel learning algorithm that transforms ...
Link to itemCite
ConferenceProceedings of Thirty Fourth Conference on Learning Theory
We study the multi-armed bandit problem with subGaussian rewards. The explore-then-commit (ETC) strategy, which consists of an exploration phase followed by an exploitation phase, is one of the most widely used algorithms in a variety of online decision ap ...
Link to itemCite