Decrease p1 and increase p2, in order to reduce the value of df/dp1 and df/dp2
chamusyuan
It's also interesting to know gradient descent with momentum: it considers the past gradients to smooth out the update. Gradient descent with momentum computes an exponentially weighted average of your gradients, and then use that gradient to update the weights instead. It works faster than the standard gradient descent algorithm.
Decrease p1 and increase p2, in order to reduce the value of df/dp1 and df/dp2