更改

自然语言处理 (查看源代码)

2020年9月26日 (六) 15:39的版本

删除40字节、 2020年9月26日 (六) 15:39

→‎统计自然语言处理（1990s-2010s）

第23行：第23行：

=== 统计自然语言处理（1990s-2010s） ===

−

直到20世纪80年代，大多数自然语言处理系统仍依赖于复杂的、人工制定的规则。然而从20世纪80年代末开始，随着语言处理'''[[机器学习 Machine Learning]]'''算法的引入，自然语言处理领域掀起了一场革命。这是由于计算能力的稳步增长（参见'''[[摩尔定律]]'''）和'''[[乔姆斯基语言学理论]]的'''主导地位的削弱（如'''[[转换语法]]'''）。乔姆斯基语言学理论并不认同语料库语言学，而'''[[语料库语言学]]'''却是语言处理机器学习方法的基础。<ref>Chomskyan linguistics encourages the investigation of "[[corner case]]s" that stress the limits of its theoretical models (comparable to [[pathological (mathematics)|pathological]] phenomena in mathematics), typically created using [[thought experiment]]s, rather than the systematic investigation of typical phenomena that occur in real-world data, as is the case in [[corpus linguistics]]. The creation and use of such [[text corpus|corpora]] of real-world data is a fundamental part of machine-learning algorithms for natural language processing. In addition, theoretical underpinnings of Chomskyan linguistics such as the so-called "[[poverty of the stimulus]]" argument entail that general learning algorithms, as are typically used in machine learning, cannot be successful in language processing. As a result, the Chomskyan paradigm discouraged the application of such models to language processing.</ref>一些最早被使用的机器学习算法，比如'''[[~~决策树Decision Tree~~]]'''，使用“如果...那么..."(if-then)硬判决系统，类似于之前既有的人工制定的规则。然而，'''[[词性标注 ~~Part-of-speech Tagging]]~~'''将'''[[隐马尔可夫模型 ]]'''引入到自然语言处理中，并且研究重点被放在了统计模型上。统计模型将输入数据的各个特征都赋上实值权重，从而做出'''软判决'''和'''概率决策'''。许多语音识别系统现所依赖的缓存语言模型就是这种统计模型的例子。这种模型在给定非预期输入，尤其是包含错误的输入（在实际数据中这是非常常见的），并且将多个子任务整合到较大系统中时，结果通常更加可靠。

+

直到20世纪80年代，大多数自然语言处理系统仍依赖于复杂的、人工制定的规则。然而从20世纪80年代末开始，随着语言处理'''[[机器学习 Machine Learning]]'''算法的引入，自然语言处理领域掀起了一场革命。这是由于计算能力的稳步增长（参见'''[[摩尔定律]]'''）和'''[[乔姆斯基语言学理论]]的'''主导地位的削弱（如'''[[转换语法]]'''）。乔姆斯基语言学理论并不认同语料库语言学，而'''[[语料库语言学]]'''却是语言处理机器学习方法的基础。<ref>Chomskyan linguistics encourages the investigation of "[[corner case]]s" that stress the limits of its theoretical models (comparable to [[pathological (mathematics)|pathological]] phenomena in mathematics), typically created using [[thought experiment]]s, rather than the systematic investigation of typical phenomena that occur in real-world data, as is the case in [[corpus linguistics]]. The creation and use of such [[text corpus|corpora]] of real-world data is a fundamental part of machine-learning algorithms for natural language processing. In addition, theoretical underpinnings of Chomskyan linguistics such as the so-called "[[poverty of the stimulus]]" argument entail that general learning algorithms, as are typically used in machine learning, cannot be successful in language processing. As a result, the Chomskyan paradigm discouraged the application of such models to language processing.</ref>一些最早被使用的机器学习算法，比如'''[[决策树]]'''，使用“如果...那么..."(if-then)硬判决系统，类似于之前既有的人工制定的规则。然而，'''词性标注'''将'''[[隐马尔可夫模型 ]]'''引入到自然语言处理中，并且研究重点被放在了统计模型上。统计模型将输入数据的各个特征都赋上实值权重，从而做出'''软判决'''和'''概率决策'''。许多语音识别系统现所依赖的缓存语言模型就是这种统计模型的例子。这种模型在给定非预期输入，尤其是包含错误的输入（在实际数据中这是非常常见的），并且将多个子任务整合到较大系统中时，结果通常更加可靠。

--[[用户:Thingamabob|Thingamabob]]（[[用户讨论:Thingamabob|讨论]]）"对词性标注的需求使得隐马尔可夫模型被引入到自然语言处理中"一句为意译

打豆豆

421

个编辑

更改

自然语言处理 (查看源代码)

2020年9月26日 (六) 15:39的版本

导航菜单

搜索