10e6b No.1202
so i was playing around to see if anyone else has tried equilibration for their training runs. turns out it's a neat trick! basically instead of using one fixed step size, you adjust the rate based on how fast different parts are changingslower gradients get more attention while faster ones cool down. i've noticed that in non-convex optimization landscapes (think those flat stretches and weird saddle shapes), traditional learning rates can really struggle. but with equilibrated adaptive methods like esgd [equivalence to adam], it's much smoother sailing! have any of you tried this out? i'd love some thoughts on whether the improvement is worth the added complexity or if there are better tricks in your toolbelt for speeding up training…
Source:
https://dev.to/paperium/equilibrated-adaptive-learning-rates-for-non-convex-optimization-14dm