## CNN training

(1) model

(2) loss function

（當然我們作研究還是要自己搞懂基礎喔^.<）

## RNN/LSTM training

Github: https://github.com/Element-Research/rnn

Element-Research給我們的答案是他們創造了一個新的模式

training()

In training mode, the network remembers all previous rho (number of time-steps) states. This is necessary for BPTT.

evaluate()

During evaluation, since their is no need to perform BPTT at a later time, only the previous step is remembered. This is very efficient memory-wise, such that evaluation can be performed using potentially infinite-length sequence.

remember([mode])

When mode='neither' (the default behavior of the class), the Sequencer will additionally call forget before each call to forward. When mode='both' (the default when calling this function), the Sequencer will never call forget. In which case, it is up to the user to call forget between independent sequences. This behavior is only applicable to decorated AbstractRecurrent modules. Accepted values for argument mode are as follows :

'eval' only affects evaluation (recommended for RNNs)
'train' only affects training
'neither' affects neither training nor evaluation (default behavior of the class)
'both' affects both training and evaluation (recommended for LSTMs)

forget(offset)

This method brings back all states to the start of the sequence buffers, i.e. it forgets the current sequence. It also resets the step attribute to 1. It is highly recommended to call forget after each parameter update. Otherwise, the previous state will be used to activate the next, which will often lead to instability. This is caused by the previous state being the result of now changed parameters. It is also good practice to call forget at the start of each new sequence.

Note : make sure you call brnn:forget() after each call to updateParameters().

（雖然好像沒有明確寫出remember()和training()之間的差別就是了，