Sheng Su 2015-10-19

Last week:

I have find out the reason why two GPU work well, and four GPU don't. This base on two facts:
- 1. Mini-batch training : sum the gradient of all the frames in the batch
- 2. Mini-batch size : the baseline will not converge if we set mini-batch size over 1024
Reason:
- Mini-batch size is M, after N mini-batch we sum all the gradient from 4 GPU and update the net once.(during the N mini-batch, we don’t update the net).This is equal to the baseline with mini-batch size of M*N*4, much larger than baseline. However if we update the net during the N mini-batch, it seems like, to some extent, reduce the mini-batch size. That’s why two GPUs work well, and four GPUs don’t.

This week:

导航菜单