更改

自然语言处理 (查看源代码)

2020年9月26日 (六) 15:42的版本

删除66字节、 2020年9月26日 (六) 15:42

第37行：第37行：

==基于规则的NLP vs. 统计NLP（SNLP)==

−

在早期，许多语言处理系统是通过人工编码一组规则来设计的<ref name=winograd:shrdlu71>{{cite thesis |last=Winograd |first=Terry |year=1971 |title=Procedures as a Representation for Data in a Computer Program for Understanding Natural Language |url=http://hci.stanford.edu/winograd/shrdlu/ }}</ref><ref name=schank77>{{cite book |first=Roger C. |last=Schank |first2=Robert P. |last2=Abelson |year=1977 |title=Scripts, Plans, Goals, and Understanding: An Inquiry Into Human Knowledge Structures |location=Hillsdale |publisher=Erlbaum |isbn=0-470-99033-3 }}</ref>: 例如通过编写语法或设计用于词干提取的'''[[启发式 Heuristic]]'''规则。

+

在早期，许多语言处理系统是通过人工编码一组规则来设计的<ref name=winograd:shrdlu71>{{cite thesis |last=Winograd |first=Terry |year=1971 |title=Procedures as a Representation for Data in a Computer Program for Understanding Natural Language |url=http://hci.stanford.edu/winograd/shrdlu/ }}</ref><ref name=schank77>{{cite book |first=Roger C. |last=Schank |first2=Robert P. |last2=Abelson |year=1977 |title=Scripts, Plans, Goals, and Understanding: An Inquiry Into Human Knowledge Structures |location=Hillsdale |publisher=Erlbaum |isbn=0-470-99033-3 }}</ref>: 例如通过编写语法或设计用于词干提取的'''启发式 Heuristic'''规则。

.

自20世纪80年代末和90年代中期的“统计革命”<ref name=johnson:eacl:ilcl09>[http://www.aclweb.org/anthology/W09-0103 Mark Johnson. How the statistical revolution changes (computational) linguistics.] Proceedings of the EACL 2009 Workshop on the Interaction between Linguistics and Computational Linguistics.</ref><ref name=resnik:langlog11>[http://languagelog.ldc.upenn.edu/nll/?p=2946 Philip Resnik. Four revolutions.] Language Log, February 5, 2011.</ref>以来，许多自然语言处理研究都深度依赖机器学习。机器学习的范式要求通过分析大型语料库(corpora,语料库corpus的复数形式，是一组可能带有人或计算机标注的文档)使用统计学推论自动学习这些规则。

−

许多不同类型的机器学习算法已被应用在自然语言处理任务中。这些算法将输入数据的大量“特性”作为输入。一些最早被使用的算法，比如'''~~[[决策树Decision Tree]]~~'''，使用“如果...那么..."(if-then)硬判决系统，类似于之前既有的人工制定的规则。然而后来人们将研究重点聚焦在统计模型上。统计模型将输入数据的各个特征都赋上实值权重，从而做出'''[[软判决 ~~Soft Decision]]~~'''和'''[[概率决策 ~~Probabilistic Decision]]~~'''。这种模型的优点是，它们可以表示出许多不同的可能答案的相对确定性，而不仅仅是一个答案。当这种模型作为一个更大系统的模块时，产生的结果更加可靠。

+

许多不同类型的机器学习算法已被应用在自然语言处理任务中。这些算法将输入数据的大量“特性”作为输入。一些最早被使用的算法，比如'''决策树'''，使用“如果...那么..."(if-then)硬判决系统，类似于之前既有的人工制定的规则。然而后来人们将研究重点聚焦在统计模型上。统计模型将输入数据的各个特征都赋上实值权重，从而做出'''软判决'''和'''概率决策'''。这种模型的优点是，它们可以表示出许多不同的可能答案的相对确定性，而不仅仅是一个答案。当这种模型作为一个更大系统的模块时，产生的结果更加可靠。

基于机器学习算法的系统比人工制定的规则有许多优点:

打豆豆

421

个编辑

更改

自然语言处理 (查看源代码)

2020年9月26日 (六) 15:42的版本

导航菜单

搜索