Cong Fang

fangcong AT pku.edu.cn


Princeton University


I am now a post-doc in Princeton University working with Jason Lee. Before that, I got my Ph.D. in Peking University advised by Zhouchen Lin and Tong Zhang. I am currently interested in theoretic anlysis on machine learning and optimization.


  • Machine Learning
  • Optimization


  • Ph.D. in Computer Engineering, 2014-2019

    Peking University

Publications @ZERO Lab

Training Neural Networks by Lifted Proximal Operator Machines. TPAMI, 2020.

We present the lifted proximal operator machine (LPOM) to train fully-connected feed-forward neural networks. LPOM represents the …

Decentralized Accelerated Gradient Methods With Increasing Penalty Parameters. IEEE T. Signal Processing, 2020.

In this paper, we study the communication and (sub)gradient computation costs in distributed optimization and give a sharp complexity …

Accelerated First-Order Optimization Algorithms for Machine Learning. P IEEE, 2020.

Numerical optimization serves as one of the pillars of machine learning. To meet the demands of big data applications, lots of efforts …

Lifted Proximal Operator Machines. AAAI, 2019.

By rewriting the activation function as an equivalent proximal operator, we approximate a feed-forward neural network by adding the …

Sharp Analysis for Nonconvex SGD Escaping from Saddle Points. COLT, 2019.

In this paper, we prove that the simplest Stochastic Gradient Descent (SGD) algorithm is able to efficiently escape from saddle points …

SPIDER Near-Optimal Non-Convex Optimization via Stochastic Path-Integrated Differential Estimator. NIPS, 2018.

We propose a new technique named Stochastic Path-Integrated Differential EstimatoR (Spider), which can be used to track many …

Faster and Non-ergodic O(1/K) Stochastic Alternating Direction Method of Multipliers. NIPS, 2017.

we propose a new stochastic ADMM which elaborately integrates Nesterov’s extrapolation and VR techniques.

Feature Learning via Partial Differential Equation with Applications to Face Recognition. PR, 2017.

We propose a novel Partial Differential Equation (PDE) based method for feature learning. The feature learned by our PDE is …

Parallel Asynchronous Stochastic Variance Reduction for Nonconvex Optimization. AAAI, 2017.

We propose the Asynchronous Stochastic Variance Reduced Gradient (ASVRG) algorithm for nonconvex finite-sum problems.

A Robust Hybrid Method for Text Detection in Natural Scenes by Learning-based Partial Differential Equations. Neurocomputing, 2015.

We present a robust hybrid method that uses learning-based PDEs for detecting texts from natural scene images.