Rong Ge

Conference 12th International Conference on Learning Representations Iclr 2024 · January 1, 2024 Despite the impressive generalization capabilities of deep neural networks, they have been repeatedly shown to be overconfident when they are wrong. Fixing this issue is known as model calibration, and has consequently received much attention in the form o ... Cite

RECALL: Membership Inference via Relative Conditional Log-Likelihoods

Conference Emnlp 2024 2024 Conference on Empirical Methods in Natural Language Processing Proceedings of the Conference · January 1, 2024 The rapid scaling of large language models (LLMs) has raised concerns about the transparency and fair use of the data used in their pretraining. Detecting such content is challenging due to the scale of the data and limited exposure of each instance during ... Full text Cite

Mean-Field Analysis for Learning Subspace-Sparse Polynomials with Gaussian Input

Conference Advances in Neural Information Processing Systems · January 1, 2024 In this work, we study the mean-field flow for learning subspace-sparse polynomials using stochastic gradient descent and two-layer neural networks, where the input distribution is standard Gaussian and the output only depends on the projection of the inpu ... Cite

How does Gradient Descent Learn Features - A Local Analysis for Regularized Two-Layer Neural Networks

Conference Advances in Neural Information Processing Systems · January 1, 2024 The ability of learning useful features is one of the major advantages of neural networks. Although recent works show that neural network can operate in a neural tangent kernel (NTK) regime that does not allow feature learning, many works also demonstrate ... Cite

Linear Transformers are Versatile In-Context Learners

Conference Advances in Neural Information Processing Systems · January 1, 2024 Recent research has demonstrated that transformers, particularly linear attention models, implicitly execute gradient-descent-like algorithms on data provided in-context during their forward inference step. However, their capability in handling more comple ... Cite

Provably Learning Diverse Features in Multi-View Data with Midpoint Mixup

Conference Proceedings of Machine Learning Research · January 1, 2023 Mixup is a data augmentation technique that relies on training using random convex combinations of data points and their labels. In recent years, Mixup has become a standard primitive used in the training of state-of-the-art image classification models due ... Cite

Hiding Data Helps: On the Benefits of Masking for Sparse Coding

Conference Proceedings of Machine Learning Research · January 1, 2023 Sparse coding, which refers to modeling a signal as sparse linear combinations of the elements of a learned dictionary, has proven to be a successful (and interpretable) approach in applications such as signal processing, computer vision, and medical imagi ... Cite

Implicit Regularization Leads to Benign Overfitting for Sparse Linear Regression

Conference Proceedings of Machine Learning Research · January 1, 2023 In deep learning, often the training process finds an interpolator (a solution with 0 training loss), but the test loss is still low. This phenomenon, known as benign overfitting, is a major mystery that received a lot of recent attention. One common mecha ... Cite

Do Transformers Parse while Predicting the Masked Word?

Conference Emnlp 2023 2023 Conference on Empirical Methods in Natural Language Processing Proceedings · January 1, 2023 Pre-trained language models have been shown to encode linguistic structures like parse trees in their embeddings while being trained unsupervised. Some doubts have been raised whether the models are doing parsing or only some computation weakly correlated ... Full text Cite

Smoothing the Landscape Boosts the Signal for SGD Optimal Sample Complexity for Learning Single Index Models

Conference Advances in Neural Information Processing Systems · January 1, 2023 We focus on the task of learning a single index model σ(w* · x) with respect to the isotropic Gaussian distribution in d dimensions. Prior work has shown that the sample complexity of learning w* is governed by the information exponen ... Cite

The Role of Linguistic Priors in Measuring Compositional Generalization of Vision-Language Models

Conference Proceedings of Machine Learning Research · January 1, 2023 Compositionality is a common property in many modalities including text and images, but the compositional generalization of multi-modal models is not well-understood. In this paper, we identify two sources of visual-linguistic compositionality: linguistic ... Cite

Connecting Pre-trained Language Models and Downstream Tasks via Properties of Representations

Conference Advances in Neural Information Processing Systems · January 1, 2023 Recently, researchers have found that representations learned by large-scale pretrained language models are useful in various downstream tasks. However, there is little theoretical understanding of how pre-training performance is related to downstream task ... Cite

PLATEAU IN MONOTONIC LINEAR INTERPOLATION - A "BIASED" VIEW OF LOSS LANDSCAPE FOR DEEP NETWORKS

Conference 11th International Conference on Learning Representations Iclr 2023 · January 1, 2023 Monotonic linear interpolation (MLI) - on the line connecting a random initialization with the minimizer it converges to, the loss and accuracy are monotonic - is a phenomenon that is commonly observed in the training of neural networks.Such a phenomenon m ... Cite

UNDERSTANDING THE ROBUSTNESS OF SELF-SUPERVISED LEARNING THROUGH TOPIC MODELING

Conference 11th International Conference on Learning Representations Iclr 2023 · January 1, 2023 Self-supervised learning has significantly improved the performance of many NLP tasks. However, how can self-supervised learning discover useful representations, and why is it better than traditional approaches such as probabilistic models are still largel ... Cite

DEPTH SEPARATION WITH MULTILAYER MEAN-FIELD NETWORKS

Conference 11th International Conference on Learning Representations Iclr 2023 · January 1, 2023 Depth separation-why a deeper network is more powerful than a shallower one-has been a major problem in deep learning theory. Previous results often focus on representation power. For example, Safran et al. (2019) constructed a function that is easy to app ... Cite

UNDERSTANDING EDGE-OF-STABILITY TRAINING DYNAMICS WITH A MINIMALIST EXAMPLE

Conference 11th International Conference on Learning Representations Iclr 2023 · January 1, 2023 Recently, researchers observed that gradient descent for deep neural networks operates in an “edge-of-stability” (EoS) regime: the sharpness (maximum eigenvalue of the Hessian) is often larger than stability threshold 2/η (where η is the step size). Despit ... Cite

Robust Second-Order Nonconvex Optimization and Its Application to Low Rank Matrix Sensing

Conference Advances in Neural Information Processing Systems · January 1, 2023 Finding an approximate second-order stationary point (SOSP) is a well-studied and fundamental problem in stochastic nonconvex optimization with many applications in machine learning. However, this problem is poorly understood in the presence of outliers, l ... Cite

Optimization landscape of Tucker decomposition

Journal Article Mathematical Programming · June 1, 2022 Tucker decomposition is a popular technique for many data analysis and machine learning applications. Finding a Tucker decomposition is a nonconvex optimization problem. As the scale of the problems increases, local search algorithms such as stochastic gra ... Full text Cite

On the optimization landscape of tensor decompositions

Journal Article Mathematical Programming · June 1, 2022 Non-convex optimization with local search heuristics has been widely used in machine learning, achieving many state-of-art results. It becomes increasingly important to understand why they can work for these NP-hard problems on typical data. The landscape ... Full text Cite

Online Algorithms with Multiple Predictions

Conference Proceedings of Machine Learning Research · January 1, 2022 This paper studies online algorithms augmented with multiple machine-learned predictions. We give a generic algorithmic framework for online covering problems with multiple predictions that obtains an online solution that is competitive against the perform ... Cite

Extracting Latent State Representations with Linear Dynamics from Rich Observations

Conference Proceedings of Machine Learning Research · January 1, 2022 Recently, many reinforcement learning techniques have been shown to have provable guarantees in the simple case of linear dynamics, especially in problems like linear quadratic regulators. However, in practice many tasks require learning a policy from rich ... Cite

TOWARDS UNDERSTANDING THE DATA DEPENDENCY OF MIXUP-STYLE TRAINING

Conference Iclr 2022 10th International Conference on Learning Representations · January 1, 2022 In the Mixup training paradigm, a model is trained using convex combinations of data points and their associated labels. Despite seeing very few true data points during training, models trained using Mixup seem to still minimize the original empirical risk ... Cite

Outlier-Robust Sparse Estimation via Non-Convex Optimization

Conference Advances in Neural Information Processing Systems · January 1, 2022 We explore the connection between outlier-robust high-dimensional statistics and non-convex optimization in the presence of sparsity constraints, with a focus on the fundamental tasks of robust sparse mean estimation and robust sparse PCA. We develop novel ... Cite

Online Service with Delay

Other ACM Transactions on Algorithms · August 1, 2021 In this article, we introduce the online service with delay problem. In this problem, there are n points in a metric space that issue service requests over time, and there is a server that serves these requests. The goal is to minimize the sum of distance ... Full text Cite

On Nonconvex Optimization for Machine Learning

Journal Article Journal of the ACM · March 1, 2021 Gradient descent (GD) and stochastic gradient descent (SGD) are the workhorses of large-scale machine learning. While classical theory focused on analyzing the performance of these methods in convex optimization problems, the most notable successes in mach ... Full text Cite

Guarantees for Tuning the Step Size using a Learning-to-Learn Approach

Conference Proceedings of Machine Learning Research · January 1, 2021 Choosing the right parameters for optimization algorithms is often the key to their success in practice. Solving this problem using a learning-to-learn approach-using meta-gradient descent on a meta-objective based on the trajectory that the optimizer gene ... Cite

Understanding Deflation Process in Over-parametrized Tensor Decomposition

Conference Advances in Neural Information Processing Systems · January 1, 2021 In this paper we study the training dynamics for gradient flow on over-parametrized tensor decomposition problems. Empirically, such training process often first fits larger components and then discovers smaller components, which is similar to a tensor def ... Cite

A Regression Approach to Learning-Augmented Online Algorithms

Conference Advances in Neural Information Processing Systems · January 1, 2021 The emerging field of learning-augmented online algorithms uses ML techniques to predict future input parameters and thereby improve the performance of online algorithms. Since these parameters are, in general, real-valued functions, a natural approach is ... Cite

Efficient sampling from the Bingham distribution

Conference Proceedings of Machine Learning Research · January 1, 2021 We give a algorithm for exact sampling from the Bingham distribution p(x) ∝ exp(x⊺Ax) on the sphere Sd-1 with expected runtime of poly(d, λmax(A) - λmin(A)). The algorithm is based on rejection sampling, where th ... Cite

A Local Convergence Theory for Mildly Over-Parameterized Two-Layer Neural Network

Conference Proceedings of Machine Learning Research · January 1, 2021 While over-parameterization is widely believed to be crucial for the success of optimization for the neural networks, most existing theories on over-parameterization do not fully explain the reason—they either work in the Neural Tangent Kernel regime where ... Cite

Topic Models and Nonnegative Matrix Factorization

Chapter · January 1, 2021 In this chapter, we introduce nonnegative matrix factorization and topic modeling. We will see that there is a natural structural assumption called separability that allows us to circumvent the worst-caseNP-hardness results for nonnegative matrix factoriza ... Full text Cite

Estimating normalizing constants for log-concave distributions: Algorithms and lower bounds

Journal Article Proceedings of the Annual ACM Symposium on Theory of Computing · June 8, 2020 Estimating the normalizing constant of an unnormalized probability distribution has important applications in computer science, statistical physics, machine learning, and statistics. In this work, we consider the problem of estimating the normalizing const ... Full text Cite

Beyond lazy training for over-parameterized tensor decomposition

Journal Article Advances in Neural Information Processing Systems · January 1, 2020 Over-parametrization is an important technique in training neural networks. In both theory and practice, training a larger network allows the optimization algorithm to avoid bad local optimal solutions. In this paper we study a closely related tensor decom ... Cite

High-dimensional robust mean estimation via gradient descent

Conference 37th International Conference on Machine Learning Icml 2020 · January 1, 2020 We study the problem of high-dimensional robust mean estimation in the presence of a constant fraction of adversarial outliers. A recent line of work has provided sophisticated polynomial-time algorithms for this problem with dimension-independent error gu ... Cite

Customizing ML predictions for online algorithms

Conference 37th International Conference on Machine Learning Icml 2020 · January 1, 2020 A popular line of recent research incorporates ML advice in the design of online algorithms to improve their performance in typical instances. These papers treat the ML algorithm as a blackbox, and redesign online algorithms to take advantage of ML predict ... Cite

High-dimensional robust mean estimation in nearly-linear time

Conference Proceedings of the Annual ACM SIAM Symposium on Discrete Algorithms · January 1, 2019 We study the fundamental problem of high-dimensional mean estimation in a robust model where a constant fraction of the samples are adversarially corrupted. Recent work gave the first polynomial time algorithms for this problem with dimension-independent e ... Full text Cite

Learning two-layer neural networks with symmetric inputs

Journal Article 7th International Conference on Learning Representations, ICLR 2019 · January 1, 2019 © 7th International Conference on Learning Representations, ICLR 2019. All Rights Reserved. We give a new algorithm for learning a two-layer neural network under a general class of input distributions. Assuming there is a ground-truth two-layer network y = ... Cite

Explaining landscape connectivity of low-cost solutions for multilayer nets

Journal Article Advances in Neural Information Processing Systems · January 1, 2019 Mode connectivity (Garipov et al., 2018; Draxler et al., 2018) is a surprising phenomenon in the loss landscape of deep nets. Optima'at least those discovered by gradient-based optimization'turn out to be connected by simple paths on which the loss functio ... Cite

Understanding composition of word embeddings via tensor decomposition

Conference 7th International Conference on Learning Representations, ICLR 2019 · January 1, 2019 © 7th International Conference on Learning Representations, ICLR 2019. All Rights Reserved. Word embedding is a powerful tool in natural language processing. In this paper we consider the problem of word embedding composition - given vector representations ... Cite

Learning two-layer neural networks with symmetric inputs

Conference 7th International Conference on Learning Representations Iclr 2019 · January 1, 2019 We give a new algorithm for learning a two-layer neural network under a general class of input distributions. Assuming there is a ground-truth two-layer network y = Aσ(Wx) + ξ, where A, W are weight matrices, ξ represents noise, and the number of neurons i ... Cite

Understanding composition of word embeddings via tensor decomposition

Conference 7th International Conference on Learning Representations Iclr 2019 · January 1, 2019 Word embedding is a powerful tool in natural language processing. In this paper we consider the problem of word embedding composition - given vector representations of two words, compute a vector for the entire phrase. We give a generative model that can c ... Cite

Spectral learning on matrices and tensors

Journal Article Foundations and Trends in Machine Learning · January 1, 2019 Spectral methods have been the mainstay in several domains such as machine learning, applied mathematics and scientific computing. They involve finding a certain kind of spectral decomposition to obtain basis functions that can capture important structures ... Full text Cite

The step decay schedule: A near optimal, geometrically decaying learning rate procedure for least squares

Conference Advances in Neural Information Processing Systems · January 1, 2019 Minimax optimal convergence rates for numerous classes of stochastic convex optimization problems are well characterized, where the majority of results utilize iterate averaged stochastic gradient descent (SGD) with polynomially decaying step sizes. In con ... Cite

Faster Algorithms for High-Dimensional Robust Covariance Estimation

Conference Proceedings of Machine Learning Research · January 1, 2019 We study the problem of estimating the covariance matrix of a high-dimensional distribution when a small constant fraction of the samples can be arbitrarily corrupted. Recent work gave the first polynomial time algorithms for this problem with near-optimal ... Cite

Stabilized SVRG: Simple Variance Reduction for Nonconvex Optimization

Conference Proceedings of Machine Learning Research · January 1, 2019 Variance reduction techniques like SVRG (Johnson and Zhang, 2013) provide simple and fast algorithms for optimizing a convex finite-sum objective. For nonconvex objectives, these techniques can also find a first-order stationary point (with small gradient) ... Cite

Open Problem: Do Good Algorithms Necessarily Query Bad Points?

Conference Proceedings of Machine Learning Research · January 1, 2019 Folklore results in the theory of Stochastic Approximation indicates the (minimax) optimality of Stochastic Gradient Descent (SGD) (Robbins and Monro, 1951) with polynomially decaying step sizes and iterate averaging (Ruppert, 1988; Polyak and Juditsky, 19 ... Cite

Learning topic models — Provably and efficiently

Journal Article Communications of the ACM · April 1, 2018 Full text Cite

Stronger generalization bounds for deep nets via a compression approach

Conference 35th International Conference on Machine Learning Icml 2018 · January 1, 2018 Deep nets generalize well despite having more parameters than the number of training samples. Recent works try to give an explanation using PAC-Bayes and Margin-based analyses, but do not as yet result in sample complexity bounds better than naive paramete ... Cite

Global Convergence of Policy Gradient Methods for the Linear Quadratic Regulator

Conference 35th International Conference on Machine Learning Icml 2018 · January 1, 2018 Direct policy gradient methods for reinforcement learning and continuous control problems arc a popular approach for a variety of reasons: 1) they are easy to implement without explicit knowledge of the underlying model, 2) they are an "end- to-end" approa ... Cite

Beyond log-concavity: Provable guarantees for sampling multi-modal distributions using simulated tempering langevin Monte Carlo

Conference Advances in Neural Information Processing Systems · January 1, 2018 A key task in Bayesian machine learning is sampling from distributions that are only specified up to a partition function (i.e., constant of proportionality). One prevalent example of this is sampling posteriors in parametric distributions, such as latent- ... Cite

On the local minima of the empirical risk

Conference Advances in Neural Information Processing Systems · January 1, 2018 Population risk is always of primary interest in machine learning; however, learning algorithms only have access to the empirical risk. Even for applications with nonconvex nonsmooth losses (such as modern deep networks), the population risk is generally s ... Cite

Learning one-hidden-layer neural networks with landscape design

Conference 6th International Conference on Learning Representations, ICLR 2018 - Conference Track Proceedings · January 1, 2018 © Learning Representations, ICLR 2018 - Conference Track Proceedings.All right reserved. We consider the problem of learning a one-hidden-layer neural network: we assume the input x ∈ Rd is from Gaussian distribution and the label y = a>σ(Bx) + ξ, where a ... Cite

Learning one-hidden-layer neural networks with landscape design

Conference 6th International Conference on Learning Representations Iclr 2018 Conference Track Proceedings · January 1, 2018 We consider the problem of learning a one-hidden-layer neural network: we assume the input x ∈ Rd is from Gaussian distribution and the label y = a>σ(Bx) + ξ, where a is a nonnegative vector in Rm with m ≤ d, B ∈ Rm Cite

Non-Convex Matrix Completion Against a Semi-Random Adversary

Conference Proceedings of Machine Learning Research · January 1, 2018 Matrix completion is a well-studied problem with many machine learning applications. In practice, the problem is often solved by non-convex optimization algorithms. However, the current theoretical analysis for non-convex algorithms relies crucially on the ... Cite

Online service with delay

Conference Proceedings of the Annual ACM Symposium on Theory of Computing · June 19, 2017 In this paper, we introduce the online service with delay problem. In this problem, there are n points in a metric space that issue service requests over time, and a server that serves these requests. The goal is to minimize the sum of distance traveled by ... Full text Cite

Provable learning of noisy-or networks

Conference Proceedings of the Annual ACM Symposium on Theory of Computing · June 19, 2017 Many machine learning applications use latent variable models to explain structure in data, whereby visible variables (= coordinates of the given datapoint) are explained as a probabilistic function of some hidden variables. Finding parameters with the max ... Full text Cite

Analyzing tensor power method dynamics in overcomplete regime

Journal Article Journal of Machine Learning Research · April 1, 2017 We present a novel analysis of the dynamics of tensor power iterations in the overcomplete regime where the tensor CP rank is larger than the input dimension. Finding the CP decomposition of an overcomplete tensor is NP-hard in general. We consider the cas ... Cite

On the optimization landscape of tensor decompositions

Conference Advances in Neural Information Processing Systems · January 1, 2017 Non-convex optimization with local search heuristics has been widely used in machine learning, achieving many state-of-art results. It becomes increasingly important to understand why they can work for these NP-hard problems on typical data. The landscape ... Cite

Generalization and equilibrium in generative adversarial nets (GANs)

Conference 34th International Conference on Machine Learning Icml 2017 · January 1, 2017 Generalization is defined training of generative adversarial network (GAN), and it's shown that generalization is not guaranteed for the popular distances between distributions such as Jensen-Shannon or Wasserstein. In particular, training may appear to be ... Cite

No spurious local minima in nonconvex low rank problems: A unified geometric analysis

Conference 34th International Conference on Machine Learning Icml 2017 · January 1, 2017 In this paper we develop a new framework that captures the common landscape underlying the common non-convex low-rank matrix problems including matrix sensing, matrix completion and robust PCA. In particular, we show for all above problems (including asymm ... Cite

How to escape saddle points efficiently

Conference 34th International Conference on Machine Learning Icml 2017 · January 1, 2017 This paper shows that a perturbed form of gradient descent converges to a second-order stationary point in a number iterations which depends only poly-logarithmically on dimension (i.e., it is almost "dimension-free"). The convergence rate of this procedur ... Cite

Efficient approaches for escaping higher order saddle points in non-convex optimization

Conference Journal of Machine Learning Research · June 6, 2016 Local search heuristics for non-convex optimizations are popular in applied machine learning. However, in general it is hard to guarantee that such algorithms even converge to a local minimum, due to the existence of complicated saddle point structures in ... Cite

Minimal Realization Problems for Hidden Markov Models

Journal Article IEEE Transactions on Signal Processing · April 1, 2016 This paper addresses two fundamental problems in the context of hidden Markov models (HMMs). The first problem is concerned with the characterization and computation of a minimal order HMM that realizes the exact joint densities of an output process based ... Full text Cite

Efficient algorithms for large-scale generalized eigenvector computation and canonical correlation analysis

Conference 33rd International Conference on Machine Learning Icml 2016 · January 1, 2016 This paper considers the problem of canonical- correlation analysis, and more broadly, the generalized eigenvector problem for a pair of symmetric matrices. We consider the setting of finding top-A- canonical/eigen subspace, and solve these problems throug ... Cite

Rich component analysis

Conference 33rd International Conference on Machine Learning Icml 2016 · January 1, 2016 In many settings, we have multiple data sets (also called views) that capture different and overlapping aspects of the same phenomenon. We are often interested in finding patterns that are unique to one or to a subset of the views. For example, we might ha ... Cite

Provable algorithms for inference in topic models

Conference 33rd International Conference on Machine Learning Icml 2016 · January 1, 2016 Recently, there has been considerable progress on designing algorithms with provable guarantees - typically using linear algebraic methods - for parameter learning in latent variable models. But designing provable algorithms for inference has proven to be ... Cite

Computing a nonnegative matrix factorization-provably

Conference SIAM Journal on Computing · January 1, 2016 In the nonnegative matrix factorization (NMF) problem we are given an n × m nonnegative matrix M and an integer r > 0. Our goal is to express M as AW, where A and W are nonnegative matrices of size n×r and r×m, respectively. In some applications, it makes ... Full text Cite

Matrix completion has no spurious local minimum

Conference Advances in Neural Information Processing Systems · January 1, 2016 Matrix completion is a basic machine learning problem that has wide applications, especially in collaborative filtering and recommender systems. Simple non-convex optimization algorithms are popular and effective in practice. Despite recent progress in pro ... Cite

Decomposing overcomplete 3rd order tensors using sum-of-squares algorithms

Conference Leibniz International Proceedings in Informatics Lipics · August 1, 2015 Tensor rank and low-rank tensor decompositions have many applications in learning and complexity theory. Most known algorithms use unfoldings of tensors and can only handle rank up to n|p/2| for a p-th order tensor in Rnp. Previously ... Full text Cite

Learning mixtures of gaussians in high dimensions

Conference Proceedings of the Annual ACM Symposium on Theory of Computing · June 14, 2015 Efficiently learning mixture of Gaussians is a fundamental problem in statistics and learning theory. Given samples coming from a random one out of k Gaussian distributions in Rn, the learning problem asks to estimate the means and the covarianc ... Full text Cite

Provable ICA with Unknown Gaussian Noise, and Implications for Gaussian Mixtures and Autoencoders

Journal Article Algorithmica · May 1, 2015 We present a new algorithm for independent component analysis which has provable performance guarantees. In particular, suppose we are given samples of the form y=Ax+η where A is an unknown but non-singular n×n matrix, x is a random variable whose coordina ... Full text Cite

Simple, efficient, and neural algorithms for sparse coding

Conference Journal of Machine Learning Research · January 1, 2015 Sparse coding is a basic task in many fields including signal processing, neuroscience and machine learning where the goal is to learn a basis that enables a sparse representation of a given set of data, if one exists. Its standard formulation is as a non- ... Cite

Competing with the empirical risk minimizer in a single pass

Conference Journal of Machine Learning Research · January 1, 2015 In many estimation problems, e.g. linear and logistic regression, we wish to minimize an unknown objective given only unbiased samples of the objective function. Furthermore, we aim to achieve this using as few samples as possible. In the absence of comput ... Cite

Learning overcomplete latent variable models through tensor methods

Conference Journal of Machine Learning Research · January 1, 2015 We provide guarantees for learning latent variable models emphasizing on the overcomplete regime, where the dimensionality of the latent space exceeds the observed dimensionality. In particular, we consider multiview mixtures, ICA, and sparse coding models ... Cite

Escaping from saddle points: Online stochastic gradient for tensor decomposition

Conference Journal of Machine Learning Research · January 1, 2015 We analyze stochastic gradient descent for optimizing non-convex functions. In many cases for non-convex functions the goal is to find a reasonable local minimum, and the main concern is that gradient updates are trapped in saddle points. In this paper we ... Cite

Un-regularizing: Approximate proximal point and faster stochastic algorithms for empirical risk minimization

Conference 32nd International Conference on Machine Learning Icml 2015 · January 1, 2015 We develop a family of accelerated stochastic algorithms that optimize sums of convex functions. Our algorithms improve upon the fastest running time for empirical risk minimization (ERM), and in particular linear least-squares regression, across a wide ra ... Cite

Intersecting faces: Non-negative matrix factorization with new guarantees

Conference 32nd International Conference on Machine Learning Icml 2015 · January 1, 2015 Non-negative matrix factorization (NMF) is a natural model of admixture and is widely used in science and engineering. A plethora of algorithms have been developed to tackle NMF, but due to the non-convex nature of the problem, there is little guarantee on ... Cite

Tensor decompositions for learning latent variable models (A survey for ALT)

Conference Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics · January 1, 2015 This note is a short version of that in [1]. It is intended as a survey for the 2015 Algorithmic Learning Theory (ALT) conference. This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent vari ... Full text Cite

Tensor decompositions for learning latent variable models

Journal Article Journal of Machine Learning Research · August 1, 2014 This work considers a computationally and statistically efficient parameter estimation method for a wide class of latent variable models-including Gaussian mixture models, hidden Markov models, and latent Dirichlet allocation-which exploits a certain tenso ... Cite

Minimal realization problem for Hidden Markov Models

Journal Article 2014 52nd Annual Allerton Conference on Communication Control and Computing Allerton 2014 · January 30, 2014 In this paper, we study the problem of finding a minimal order (quasi-) Hidden Markov Model for a random process, which is the output process of an unknown stationary HMM of finite order. In the main theorem, we show that excluding a measure zero set in th ... Full text Cite

A tensor approach to learning mixed membership community models

Journal Article Journal of Machine Learning Research · January 1, 2014 Community detection is the task of detecting hidden communities from observed interactions. Guaranteed community detection has so far been mostly limited to models with non-overlapping communities such as the stochastic block model. In this paper, we remov ... Cite

New algorithms for learning incoherent and overcomplete dictionaries

Journal Article Journal of Machine Learning Research · January 1, 2014 In sparse recovery we are given a matrix A ∈ Rnxm ("the dictionary") and a vector of the form AX where X is sparse, and the goal is to recover X. This is a central notion in signal processing, statistics and machine learning. But in applications ... Cite

Provable bounds for learning some deep representations

Journal Article 31st International Conference on Machine Learning Icml 2014 · January 1, 2014 2014 We give algorithms with provable guarantees that learn a class of deep nets in the generative model view popularized by Hinton and others. Our generative model is an n node multilayer network that has degree at most nγ for some γ < 1 and each edge has ... Cite

Towards a better approximation for SPARSEST CUT?

Journal Article Proceedings Annual IEEE Symposium on Foundations of Computer Science Focs · December 1, 2013 We give a new (1 + ε)-approximation for SPARSEST CUT problem on graphs where small sets expand significantly more than the sparsest cut (expansion of sets of size n/r exceeds that of the sparsest cut by a factor √ log n log r, for some small r; this condit ... Full text Cite

A practical algorithm for topic modeling with provable guarantees

Journal Article 30th International Conference on Machine Learning Icml 2013 · January 1, 2013 Topic models provide a useful method for dimensionality reduction and exploratory data analysis in large text corpora. Most approaches to topic model learning have been based on a maximum likelihood objective. Efficient algorithms exist that attempt to app ... Cite

A tensor spectral approach to learning mixed membership community models

Journal Article Journal of Machine Learning Research · January 1, 2013 Detecting hidden communities from observed interactions is a classical problem. Theoretical analysis of community detection has so far been mostly limited to models with non-overlapping communities such as the stochastic block model. In this paper, we prov ... Cite

Learning topic models - Going beyond SVD

Journal Article Proceedings Annual IEEE Symposium on Foundations of Computer Science Focs · December 1, 2012 Topic Modeling is an approach used for automatic comprehension and classification of data in a variety of settings, and perhaps the canonical application is in uncovering thematic structure in a corpus of documents. A number of foundational works both in m ... Full text Cite

Provable ICA with unknown Gaussian noise, with implications for Gaussian mixtures and autoencoders

Journal Article Advances in Neural Information Processing Systems · December 1, 2012 We present a new algorithm for Independent Component Analysis (ICA) which has provable performance guarantees. In particular, suppose we are given samples of the form y = Ax +η where A is an unknown n × n matrix and x is a random variable whose components ... Cite

Finding overlapping communities in social networks: Toward a rigorous approach

Journal Article Proceedings of the ACM Conference on Electronic Commerce · July 10, 2012 A community in a social network is usually understood to be a group of nodes more densely connected with each other than with the rest of the network. This is an important concept in most domains where networks arise: social, technological, biological, etc ... Full text Cite

Computing a nonnegative matrix factorization - Provably

Journal Article Proceedings of the Annual ACM Symposium on Theory of Computing · June 26, 2012 The Nonnegative Matrix Factorization (NMF) problem has a rich history spanning quantum mechanics, probability theory, data analysis, polyhedral combinatorics, communication complexity, demography, chemometrics, etc. In the past decade NMF has become enormo ... Full text Cite

Another sub-exponential algorithm for the simple stochastic game

Journal Article Algorithmica New York · December 1, 2011 We study the problem of solving simple stochastic games, and give both an interesting new algorithm and a hardness result. We show a reduction from fine approximation of simple stochastic games to coarse approximation of a polynomial sized game, which can ... Full text Cite

New tools for graph coloring

Journal Article Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics · September 8, 2011 How to color 3 colorable graphs with few colors is a problem of longstanding interest. The best polynomial-time algorithm uses n 0.2072 colors. There are no indications that coloring using say O(log n) colors is hard. It has been suggested that ... Full text Cite

New algorithms for learning in presence of errors

Journal Article Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics · July 11, 2011 We give new algorithms for a variety of randomly-generated instances of computational problems using a linearization technique that reduces to solving a system of linear equations. These algorithms are derived in the context of learning with structured noi ... Full text Cite

Computational complexity and information asymmetry in financial products

Journal Article Communications of the ACM · May 1, 2011 Computational complexity studies intractable problems like NP-complete problems, which are conjectured to require more computational resources than can be provided by the fastest computers. The intractability of this problem leads to a concrete realization ... Full text Cite

New results on simple stochastic games

Journal Article Lecture Notes in Computer Science Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics · December 1, 2009 We study the problem of solving simple stochastic games, and give both an interesting new algorithm and a hardness result. We show a reduction from fine approximation of simple stochastic games to coarse approximation of a polynomial sized game, which can ... Full text Cite