BSGD

Batch Stochastic Gradient Descent

[Paper | [Code]

Along with Prof. M Vidyasagar, FRS, we introduce BSGD, an optimizer which is robust to noisy function evaluations or even when only approximate gradients are available.

We proved the conergence using the theory of stochastic approximation, and the word “Batch” here means that it is not a compulsion to all the components of the variable (like in DNNs) nor only single variable (like TD learning), rather one can choose the coordinates randomly. By employing this method, one could potentially control the Time and Memory needed.

Demonstration of BSGD under noisy setting.