Sheng Su 2015-10-12
来自cslt Wiki
four GPU training: --
- having tried to change learning rate, mini-batch size and the gap, still diverge.
- having tried to use asynchronous way to update, still diverge.
- keep going to find the reason of divergency, and going to use some other methods to try.