2

Symmetry Discovery for Different Data Types

Equivariant neural networks incorporate symmetries into their architecture, achieving higher generalization performance. However, constructing equivariant neural networks typically requires prior knowledge of data types and symmetries, which is …

High-Rank Irreducible Cartesian Tensor Decomposition and Bases of Equivariant Spaces

Irreducible Cartesian tensors (ICTs) play a crucial role in the design of equivariant graph neural networks, as well as in theoretical chemistry and chemical physics. Meanwhile, the design space of available linear operations on tensors that preserve …

Restarted Nonconvex Accelerated Gradient Descent: No More Polylogarithmic Factor in the O(ε−7/4) Complexity

This paper studies accelerated gradient methods for nonconvex optimization with Lipschitz continuous gradient and Hessian. We propose two simple accelerated gradient methods, restarted accelerated gradient descent (AGD) and restarted heavy ball (HB) …

Training Much Deeper Spiking Neural Networks with a Small Number of Time-Steps

Spiking Neural Network (SNN) is a promising energy-efficient neural architecture when implemented on neuromorphic hardware. The Artificial Neural Network (ANN) to SNN conversion method, which is the most effective SNN training method, has …

Variance Reduced EXTRA and DIGing and Their Optimal Acceleration for Strongly Convex Decentralized Optimization

We study stochastic decentralized optimization for the problem of training machine learning models with large-scale distributed data. We extend the widely used EXTRA and DIGing methods with variance reduction (VR), and propose two methods: VR-EXTRA …

Training Neural Networks by Lifted Proximal Operator Machines

We present the lifted proximal operator machine (LPOM) to train fully-connected feed-forward neural networks. LPOM represents the activation function as an equivalent proximal operator and adds the proximal operators to the objective function of a …

Decentralized Accelerated Gradient Methods With Increasing Penalty Parameters

In this paper, we study the communication and (sub)gradient computation costs in distributed optimization and give a sharp complexity analysis for the proposed distributed accelerated gradient methods. We present two algorithms based on the framework …

Accelerated First-Order Optimization Algorithms for Machine Learning

Numerical optimization serves as one of the pillars of machine learning. To meet the demands of big data applications, lots of efforts have been done on designing theoretically and practically fast algorithms. This paper provides a comprehensive …

On the Complexity Analysis of the Primal Solutions for the Accelerated Randomized Dual Coordinate Ascent

Our study aims to give the convergence rate analysis of the primal solutions for the accelerated randomized dual coordinate ascent.

Revisiting EXTRA for Smooth Distributed Optimization

EXTRA is a popular method for dencentralized distributed optimization and has broad applications. This paper revisits EXTRA. First, we give a sharp complexity analysis for EXTRA with the improved O((Lμ+11−σ2(W))log1ϵ(1−σ2(W))) communication and …