“第十二章 机器学习基本流程”版本间的差异
来自cslt Wiki
(→高级读者) |
(→高级读者) |
||
第48行: | 第48行: | ||
* Wolpert, David (1996), "The Lack of A Priori Distinctions between Learning Algorithms", Neural Computation, pp. 1341–1390 [*][https://web.archive.org/web/20161220125415/http://www.zabaras.com/Courses/BayesianComputing/Papers/lack_of_a_priori_distinctions_wolpert.pdf] | * Wolpert, David (1996), "The Lack of A Priori Distinctions between Learning Algorithms", Neural Computation, pp. 1341–1390 [*][https://web.archive.org/web/20161220125415/http://www.zabaras.com/Courses/BayesianComputing/Papers/lack_of_a_priori_distinctions_wolpert.pdf] | ||
* Sebastian Ruder, An overview of gradient descend algorithms,2017 [https://arxiv.org/pdf/1609.04747.pdf] | * Sebastian Ruder, An overview of gradient descend algorithms,2017 [https://arxiv.org/pdf/1609.04747.pdf] | ||
− | * Kirkpatrick, S.; Gelatt Jr, C. D.; Vecchi, M. P. (1983). "Optimization by Simulated Annealing". Science. 220 (4598): 671–680. [https:// | + | * Kirkpatrick, S.; Gelatt Jr, C. D.; Vecchi, M. P. (1983). "Optimization by Simulated Annealing". Science. 220 (4598): 671–680. [https://www.science.org/doi/10.1126/science.220.4598.671] |
* Brown et al., Language Models are Few-Shot Learners [https://arxiv.org/pdf/2005.14165.pdf] | * Brown et al., Language Models are Few-Shot Learners [https://arxiv.org/pdf/2005.14165.pdf] |
2023年8月8日 (二) 09:48的最后版本
教学资料
扩展阅读
- 维基百科:没有免费的午餐定理 [5]
- 维基百科:梯度下降法[6][7]
- 百度百科:梯度下降法[8][9]
- 知乎:梯度下降法[10]
- 知乎:小批量梯度下降法[11]
- 知乎:动量梯度下降法[12][]
- 维基百科:模拟退火算法 [13][14]
- 百度百科:模拟退火算法[15][16]
- 知乎:模拟退火详解 [17]
- 维基百科:牛顿法 [18][19]
- 维基百科:奥卡姆剃刀[20][21]
- 百度百科:奥卡姆剃刀[22][23]
- 维基百科:过拟合[24][25]
- 维基百科:GPT-3 [26][27]
- 机器之心:当谈论机器学习中的公平公正时,我们该谈论些什么?[28]
- 机器之心:数据增强 [29]
- 知乎:数据增强 [30][31]
- 什么是模型预训练[32]
- 迁移学习 [33]
视频展示
演示链接
开发者资源
高级读者
- 王东,机器学习导论,第一章“绪论”,第十一章“优化方法”[38]
- Wolpert, David (1996), "The Lack of A Priori Distinctions between Learning Algorithms", Neural Computation, pp. 1341–1390 [*][39]
- Sebastian Ruder, An overview of gradient descend algorithms,2017 [40]
- Kirkpatrick, S.; Gelatt Jr, C. D.; Vecchi, M. P. (1983). "Optimization by Simulated Annealing". Science. 220 (4598): 671–680. [41]
- Brown et al., Language Models are Few-Shot Learners [42]