By rewriting the activation function as an equivalent proximal operator, we approximate a feed-forward neural network by adding the proximal operators to the objective function as penalties, hence we call the lifted proximal operator machine (LPOM).
We propose D-LADMM, which is a K-layer LADMM inspired deep neural network and rigorously prove that there exist a set of learnable parameters for D-LADMM to generate globally converged solutions.
In this paper, we prove that the simplest Stochastic Gradient Descent (SGD) algorithm is able to efficiently escape from saddle points and find an (eps, O(eps^0.5))-approximate second-order stationary point in O˜(eps^-3.5) stochastic gradient …
We address the latency problem of RNN by quantizing the network, both weights and activations, into multiple binary codes {−1, +1}. We formulate the quantization as an optimization problem.
We propose an augmented Lagrange multiplier based algorithm to solve this nonconvex and nonsmooth problem with the convergence guarantee that every accumulation point is a KKT point.
Optimization over low rank matrices has broad applications in machine learning. For large scale problems, an attractive heuristic is to factorize the low rank matrix to a product of two much smaller matrices. In this paper, we study the nonconvex …
We propose the Fast Proximal Augmented Lagragian Method (Fast PALM) which achieves the convergence rate O(1/K^2), compared with O(1/K) by the traditional PALM.
We propose a new majorization-minimization (MM) method for non-smooth and non-convex programs, which is general enough to include the existing MM methods.