|
|
第51行: |
第51行: |
| | '''45.32''' | | | '''45.32''' |
| |} | | |} |
| + | * 1.4~1.9 BLEU score improvement |
| * This model is similar to multi-source neural translation but uses less resource | | * This model is similar to multi-source neural translation but uses less resource |
| || | | || |
| + | * Test the model on big data |
| * Explore different attention merge strategies | | * Explore different attention merge strategies |
| * Explore hierarchical model | | * Explore hierarchical model |
Date |
People |
Last Week |
This Week
|
2017/5/31
|
Jiyuan Zhang |
|
|
Aodong LI |
- code double-attention model with final_attn = alpha * attn_ch + beta * attn_en
- baseline bleu = 43.87
- experiments with random initialized embedding:
alpha
|
beta
|
result (bleu)
|
1
|
1
|
43.50
|
4/3
|
2/3
|
43.58 (w/o retrained)
|
2/3
|
4/3
|
42.22 (w/o retrained)
|
2/3
|
4/3
|
42.36 (w/ retrained)
|
- experiments with constant initialized embedding:
alpha
|
beta
|
result (bleu)
|
1
|
1
|
45.41
|
4/3
|
2/3
|
45.79
|
2/3
|
4/3
|
45.32
|
- 1.4~1.9 BLEU score improvement
- This model is similar to multi-source neural translation but uses less resource
|
- Test the model on big data
- Explore different attention merge strategies
- Explore hierarchical model
|
Shiyue Zhang |
- found dropout bug, fix it, and reran baseline: baseline 35.21, baseline(outproj=emb) 35.24
- tried several embed set models, failed
- embedded other words to model embedding space (trained on train data not big data), and then directly used in baseline(outproj=emb)
30000
|
50000
|
70000
|
90000
|
35.24
|
34.52
|
33.73
|
33.16
|
4564 (6666)
|
4535
|
4469
|
4426
|
|
- get word2vec on big data, and compare with word2vec from train data
- test m-nmt model, increase vocab size and test
- review zh-uy/uy-zh related works, start to write paper
|
Shipan Ren |
|
|