第8行: |
第8行: |
| | | |
| 尽管相关工作可以追溯到更早,但自然语言处理(NLP)还是通常被认为始于20世纪50年代。 | | 尽管相关工作可以追溯到更早,但自然语言处理(NLP)还是通常被认为始于20世纪50年代。 |
| + | |
| + | === Symbolic NLP (1950s - early 1990s) === |
| | | |
| * '''1950s''':1950年,艾伦 · 图灵发表《计算机器与智能》一文,提出'''[[图灵测试 Turing Test]]'''作为判断机器智能程度的标准。 | | * '''1950s''':1950年,艾伦 · 图灵发表《计算机器与智能》一文,提出'''[[图灵测试 Turing Test]]'''作为判断机器智能程度的标准。 |
第13行: |
第15行: |
| 1954年乔治敦大学成功将六十多个俄语句子自动翻译成了英语。作者声称在三到五年内将解决机器翻译问题<ref>{{cite web|author=Hutchins, J.|year=2005|url=http://www.hutchinsweb.me.uk/Nutshell-2005.pdf|title=The history of machine translation in a nutshell}}{{self-published source|date=December 2013}}</ref>,然而,事实上的进展要缓慢得多,1966年的ALPAC报告认为,长达10年的研究并未达到预期目标。自此之后,投入到机器翻译领域的资金急剧减少。直到20世纪80年代后期,当第一个'''[[统计机器翻译 Statistical Machine Translation]]'''系统被开发出来以后,机器翻译的研究才得以进一步推进。 | | 1954年乔治敦大学成功将六十多个俄语句子自动翻译成了英语。作者声称在三到五年内将解决机器翻译问题<ref>{{cite web|author=Hutchins, J.|year=2005|url=http://www.hutchinsweb.me.uk/Nutshell-2005.pdf|title=The history of machine translation in a nutshell}}{{self-published source|date=December 2013}}</ref>,然而,事实上的进展要缓慢得多,1966年的ALPAC报告认为,长达10年的研究并未达到预期目标。自此之后,投入到机器翻译领域的资金急剧减少。直到20世纪80年代后期,当第一个'''[[统计机器翻译 Statistical Machine Translation]]'''系统被开发出来以后,机器翻译的研究才得以进一步推进。 |
| | | |
− | *'''1960s''SHRDLU和ELIZA是于20世纪60年代开发的两款非常成功的自然语言处理系统。其中,SHRDLU是一个工作在词汇有限的“积木世界”的自然语言系统;而ELIZA则是一款由约瑟夫·维森鲍姆在1964年至1966年之间编写的罗杰式模拟心理治疗师。ELIZA几乎没有使用任何有关人类思想或情感的信息,但有时却能做出一些令人吃惊的类似人类之间存在的互动。当“病人”的问题超出了它有限的知识范围时,ELIZA很可能会给出一般性的回复。例如,它可能会用“你为什么说你头疼? ”来回答病人提出的“我的头疼”之类的问题。 | + | *'''1960s'''SHRDLU和ELIZA是于20世纪60年代开发的两款非常成功的自然语言处理系统。其中,SHRDLU是一个工作在词汇有限的“积木世界”的自然语言系统;而ELIZA则是一款由约瑟夫·维森鲍姆在1964年至1966年之间编写的罗杰式模拟心理治疗师。ELIZA几乎没有使用任何有关人类思想或情感的信息,但有时却能做出一些令人吃惊的类似人类之间存在的互动。当“病人”的问题超出了它有限的知识范围时,ELIZA很可能会给出一般性的回复。例如,它可能会用“你为什么说你头疼? ”来回答病人提出的“我的头疼”之类的问题。 |
| | | |
| *'''1970s''': 20世纪70年代,程序员开始编写'''[[概念本体论 Conceptual Ontology]]'''程序,将真实世界的信息结构化为计算机可理解的数据,如 MARGIE (Schank,1975)、 SAM (Cullingford,1978)、 PAM (Wilensky,1978)、 TaleSpin (Meehan,1976)、 QUALM (Lehnert,1977)、 Politics (Carbonell,1979)和 Plot Units (Lehnert,1981)。与此同时也出现了许多聊天机器人,如 PARRY,Racter 和 Jabberwacky。 | | *'''1970s''': 20世纪70年代,程序员开始编写'''[[概念本体论 Conceptual Ontology]]'''程序,将真实世界的信息结构化为计算机可理解的数据,如 MARGIE (Schank,1975)、 SAM (Cullingford,1978)、 PAM (Wilensky,1978)、 TaleSpin (Meehan,1976)、 QUALM (Lehnert,1977)、 Politics (Carbonell,1979)和 Plot Units (Lehnert,1981)。与此同时也出现了许多聊天机器人,如 PARRY,Racter 和 Jabberwacky。 |
第23行: |
第25行: |
| ===统计自然语言处理(1990s-2010s) === | | ===统计自然语言处理(1990s-2010s) === |
| | | |
− | Up to the 1980s, most natural language processing systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in natural language processing with the introduction of [[machine learning]] algorithms for language processing. This was due to both the steady increase in computational power (see [[Moore's law]]) and the gradual lessening of the dominance of [[Noam Chomsky|Chomskyan]] theories of linguistics (e.g. [[transformational grammar]]), whose theoretical underpinnings discouraged the sort of [[corpus linguistics]] that underlies the machine-learning approach to language processing.<ref>Chomskyan linguistics encourages the investigation of "[[corner case]]s" that stress the limits of its theoretical models (comparable to [[pathological (mathematics)|pathological]] phenomena in mathematics), typically created using [[thought experiment]]s, rather than the systematic investigation of typical phenomena that occur in real-world data, as is the case in [[corpus linguistics]]. The creation and use of such [[text corpus|corpora]] of real-world data is a fundamental part of machine-learning algorithms for natural language processing. In addition, theoretical underpinnings of Chomskyan linguistics such as the so-called "[[poverty of the stimulus]]" argument entail that general learning algorithms, as are typically used in machine learning, cannot be successful in language processing. As a result, the Chomskyan paradigm discouraged the application of such models to language processing.</ref> Some of the earliest-used machine learning algorithms, such as [[decision tree]]s, produced systems of hard if-then rules similar to existing hand-written rules. However, [[Part of speech tagging|part-of-speech tagging]] introduced the use of [[hidden Markov models]] to natural language processing, and increasingly, research has focused on [[statistical models]], which make soft, [[probabilistic]] decisions based on attaching [[real-valued]] weights to the features making up the input data. The [[cache language model]]s upon which many [[speech recognition]] systems now rely are examples of such statistical models. Such models are generally more robust when given unfamiliar input, especially input that contains errors (as is very common for real-world data), and produce more reliable results when integrated into a larger system comprising multiple subtasks.
| + | 直到20世纪80年代,大多数自然语言处理系统仍依赖于复杂的、人工制定的规则。然而从20世纪80年代末开始,随着语言处理'''[[机器学习 Machine Learning]]'''算法的引入,自然语言处理领域掀起了一场革命。这是由于计算能力的稳步增长(参见'''[[摩尔定律 Moore's Law]]''')和'''[[乔姆斯基语言学理论 Chomskyan Theories of Linguistics]]的'''主导地位的削弱(如'''[[转换语法 Transformational Grammar]]''')。乔姆斯基语言学理论并不认同语料库语言学,而'''[[语料库语言学 Corpus Linguistic]]'''却是语言处理机器学习方法的基础。<ref>Chomskyan linguistics encourages the investigation of "[[corner case]]s" that stress the limits of its theoretical models (comparable to [[pathological (mathematics)|pathological]] phenomena in mathematics), typically created using [[thought experiment]]s, rather than the systematic investigation of typical phenomena that occur in real-world data, as is the case in [[corpus linguistics]]. The creation and use of such [[text corpus|corpora]] of real-world data is a fundamental part of machine-learning algorithms for natural language processing. In addition, theoretical underpinnings of Chomskyan linguistics such as the so-called "[[poverty of the stimulus]]" argument entail that general learning algorithms, as are typically used in machine learning, cannot be successful in language processing. As a result, the Chomskyan paradigm discouraged the application of such models to language processing.</ref>一些最早被使用的机器学习算法,比如'''[[决策树Decision Tree]]''',使用“如果...那么..."(if-then)硬判决系统,类似于之前既有的人工制定的规则。然而,'''[[词性标注 Part-of-speech Tagging]]'''将'''[[隐马尔可夫模型 Hidden Markov Models ]]'''引入到自然语言处理中,并且研究重点被放在了统计模型上。统计模型将输入数据的各个特征都赋上实值权重,从而做出'''[[软判决 Soft Decision]]'''和'''[[概率决策 Probabilistic Decision]]'''。许多语音识别系统现所依赖的缓存语言模型就是这种统计模型的例子。这种模型在给定非预期输入,尤其是包含错误的输入(在实际数据中这是非常常见的),并且将多个子任务整合到较大系统中时,结果通常更加可靠。 |
− | | |
− | Up to the 1980s, most natural language processing systems were based on complex sets of hand-written rules. Starting in the late 1980s, however, there was a revolution in natural language processing with the introduction of machine learning algorithms for language processing. This was due to both the steady increase in computational power (see Moore's law) and the gradual lessening of the dominance of Chomskyan theories of linguistics (e.g. transformational grammar), whose theoretical underpinnings discouraged the sort of corpus linguistics that underlies the machine-learning approach to language processing. Some of the earliest-used machine learning algorithms, such as decision trees, produced systems of hard if-then rules similar to existing hand-written rules. However, part-of-speech tagging introduced the use of hidden Markov models to natural language processing, and increasingly, research has focused on statistical models, which make soft, probabilistic decisions based on attaching real-valued weights to the features making up the input data. The cache language models upon which many speech recognition systems now rely are examples of such statistical models. Such models are generally more robust when given unfamiliar input, especially input that contains errors (as is very common for real-world data), and produce more reliable results when integrated into a larger system comprising multiple subtasks.
| |
− | | |
− | | |
− | 直到20世纪80年代,大多数自然语言处理系统仍依赖于复杂的、人工制定的规则。然而从20世纪80年代末开始,随着语言处理'''[[机器学习 Machine Learning]]'''算法的引入,自然语言处理领域掀起了一场革命。这是由于计算能力的稳步增长(参见'''[[摩尔定律 Moore's Law]]''')和'''[[乔姆斯基语言学理论 Chomskyan Theories of Linguistics]]的'''主导地位的削弱(如'''[[转换语法 Transformational Grammar]]''')。乔姆斯基语言学理论并不认同语料库语言学,而'''[[语料库语言学 Corpus Linguistic]]'''却是语言处理机器学习方法的基础。一些最早被使用的机器学习算法,比如'''[[决策树Decision Tree]]''',使用“如果...那么..."(if-then)硬判决系统,类似于之前既有的人工制定的规则。然而,'''[[词性标注 Part-of-speech Tagging]]'''将'''[[隐马尔可夫模型 Hidden Markov Models ]]'''引入到自然语言处理中,并且研究重点被放在了统计模型上。统计模型将输入数据的各个特征都赋上实值权重,从而做出'''[[软判决 Soft Decision]]'''和'''[[概率决策 Probabilistic Decision]]'''。许多语音识别系统现所依赖的缓存语言模型就是这种统计模型的例子。这种模型在给定非预期输入,尤其是包含错误的输入(在实际数据中这是非常常见的),并且将多个子任务整合到较大系统中时,结果通常更加可靠。
| |
| | | |
| --[[用户:Thingamabob|Thingamabob]]([[用户讨论:Thingamabob|讨论]])"对词性标注的需求使得隐马尔可夫模型被引入到自然语言处理中"一句为意译 | | --[[用户:Thingamabob|Thingamabob]]([[用户讨论:Thingamabob|讨论]])"对词性标注的需求使得隐马尔可夫模型被引入到自然语言处理中"一句为意译 |
| | | |
− | Many of the notable early successes occurred in the field of [[machine translation]], due especially to work at IBM Research, where successively more complicated statistical models were developed. These systems were able to take advantage of existing multilingual [[text corpus|textual corpora]] that had been produced by the [[Parliament of Canada]] and the [[European Union]] as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. However, most other systems depended on corpora specifically developed for the tasks implemented by these systems, which was (and often continues to be) a major limitation in the success of these systems. As a result, a great deal of research has gone into methods of more effectively learning from limited amounts of data.
| + | *'''1990年代''':许多早期瞩目的成功出现在'''[[机器翻译 Machine Translation]]'''领域,特别是IBM研究所的工作,他们先后开发了更复杂的统计模型。这些系统得以利用加拿大议会和欧盟编制的多语言文本语料库,因为法律要求所有行政诉讼必须翻译成相应政府系统官方语言。然而其他大多数系统都必须为所执行的任务专门开发的语料库,这一直是其成功的主要限制因素。因此,大量的研究开始利用有限的数据进行更有效地学习。 |
− | | |
− | Many of the notable early successes occurred in the field of machine translation, due especially to work at IBM Research, where successively more complicated statistical models were developed. These systems were able to take advantage of existing multilingual textual corpora that had been produced by the Parliament of Canada and the European Union as a result of laws calling for the translation of all governmental proceedings into all official languages of the corresponding systems of government. However, most other systems depended on corpora specifically developed for the tasks implemented by these systems, which was (and often continues to be) a major limitation in the success of these systems. As a result, a great deal of research has gone into methods of more effectively learning from limited amounts of data.
| |
− | | |
− | 许多早期瞩目的成功出现在'''[[机器翻译 Machine Translation]]'''领域,特别是IBM研究所的工作,他们先后开发了更复杂的统计模型。这些系统得以利用加拿大议会和欧盟编制的多语言文本语料库,因为法律要求所有行政诉讼必须翻译成相应政府系统官方语言。然而其他大多数系统都必须为所执行的任务专门开发的语料库,这一直是其成功的主要限制因素。因此,大量的研究开始利用有限的数据进行更有效地学习。
| |
− | | |
− | --[[用户:Thingamabob|Thingamabob]]([[用户讨论:Thingamabob|讨论]])"这是并且通常一直是这些系统的一个主要限制"为省译
| |
− | | |
− | Recent research has increasingly focused on [[unsupervised learning|unsupervised]] and [[semi-supervised learning]] algorithms. Such algorithms can learn from data that has not been hand-annotated with the desired answers or using a combination of annotated and non-annotated data. Generally, this task is much more difficult than [[supervised learning]], and typically produces less accurate results for a given amount of input data. However, there is an enormous amount of non-annotated data available (including, among other things, the entire content of the [[World Wide Web]]), which can often make up for the inferior results if the algorithm used has a low enough [[time complexity]] to be practical.
| |
− | | |
− | Recent research has increasingly focused on unsupervised and semi-supervised learning algorithms. Such algorithms can learn from data that has not been hand-annotated with the desired answers or using a combination of annotated and non-annotated data. Generally, this task is much more difficult than supervised learning, and typically produces less accurate results for a given amount of input data. However, there is an enormous amount of non-annotated data available (including, among other things, the entire content of the World Wide Web), which can often make up for the inferior results if the algorithm used has a low enough time complexity to be practical.
| |
− | | |
− | 近期研究更多地集中在'''[[无监督学习 Unsupervised Learning]]'''和'''[[半监督学习 Semi-supervised Learning]]'''算法上。这些算法可以从无标注但有预期答案的数据或标注和未标注兼有的数据中学习。一般而言,这种任务比'''[[监督学习 Supervised Learning]]'''困难,并且在同量数据下,产生的结果通常不精确。然而如果算法具有较低的'''[[时间复杂度 Time Complexity]]''',且无标注的数据量巨大(包括万维网),可以有效弥补结果不精确的问题。
| |
− | | |
| | | |
− | In the 2010s, [[representation learning]] and [[deep learning|deep neural network]]-style machine learning methods became widespread in natural language processing, due in part to a flurry of results showing that such techniques<ref name=goldberg:nnlp17>{{cite journal |last=Goldberg |first=Yoav |year=2016 |arxiv=1807.10854 |title=A Primer on Neural Network Models for Natural Language Processing |journal=Journal of Artificial Intelligence Research |volume=57 |pages=345–420 |doi=10.1613/jair.4992 }}</ref><ref name=goodfellow:book16>{{cite book |first=Ian |last=Goodfellow |first2=Yoshua |last2=Bengio |first3=Aaron |last3=Courville |url=http://www.deeplearningbook.org/ |title=Deep Learning |location= |publisher=MIT Press |year=2016 |isbn= }}</ref> can achieve state-of-the-art results in many natural language tasks, for example in language modeling,<ref name=jozefowicz:lm16>{{cite book |first=Rafal |last=Jozefowicz |first2=Oriol |last2=Vinyals |first3=Mike |last3=Schuster |first4=Noam |last4=Shazeer |first5=Yonghui |last5=Wu |year=2016 |arxiv=1602.02410 |title=Exploring the Limits of Language Modeling |bibcode=2016arXiv160202410J }}</ref> parsing,<ref name=choe:emnlp16>{{cite journal |first=Do Kook |last=Choe |first2=Eugene |last2=Charniak |journal=Emnlp 2016 |url=https://aclanthology.coli.uni-saarland.de/papers/D16-1257/d16-1257 |title=Parsing as Language Modeling }}</ref><ref name="vinyals:nips15">{{cite journal |last=Vinyals |first=Oriol |last2=Kaiser |first2=Lukasz |displayauthors=1 |journal=Nips2015 |title=Grammar as a Foreign Language |year=2014 |arxiv=1412.7449 |bibcode=2014arXiv1412.7449V |url=https://papers.nips.cc/paper/5635-grammar-as-a-foreign-language.pdf }}</ref> and many others. Popular techniques include the use of [[word embedding]]s to capture semantic properties of words, and an increase in end-to-end learning of a higher-level task (e.g., question answering) instead of relying on a pipeline of separate intermediate tasks (e.g., part-of-speech tagging and dependency parsing). In some areas, this shift has entailed substantial changes in how NLP systems are designed, such that deep neural network-based approaches may be viewed as a new paradigm distinct from statistical natural language processing. For instance, the term ''[[neural machine translation]]'' (NMT) emphasizes the fact that deep learning-based approaches to machine translation directly learn sequence-to-sequence transformations, obviating the need for intermediate steps such as word alignment and language modeling that was used in [[statistical machine translation]] (SMT).
| + | *'''2000年代''':近期研究更多地集中在'''[[无监督学习 Unsupervised Learning]]'''和'''[[半监督学习 Semi-supervised Learning]]'''算法上。这些算法可以从无标注但有预期答案的数据或标注和未标注兼有的数据中学习。一般而言,这种任务比'''[[监督学习 Supervised Learning]]'''困难,并且在同量数据下,产生的结果通常不精确。然而如果算法具有较低的'''[[时间复杂度 Time Complexity]]''',且无标注的数据量巨大(包括万维网),可以有效弥补结果不精确的问题。 |
| | | |
− | In the 2010s, representation learning and deep neural network-style machine learning methods became widespread in natural language processing, due in part to a flurry of results showing that such techniques can achieve state-of-the-art results in many natural language tasks, for example in language modeling, parsing, and many others. Popular techniques include the use of word embeddings to capture semantic properties of words, and an increase in end-to-end learning of a higher-level task (e.g., question answering) instead of relying on a pipeline of separate intermediate tasks (e.g., part-of-speech tagging and dependency parsing). In some areas, this shift has entailed substantial changes in how NLP systems are designed, such that deep neural network-based approaches may be viewed as a new paradigm distinct from statistical natural language processing. For instance, the term neural machine translation (NMT) emphasizes the fact that deep learning-based approaches to machine translation directly learn sequence-to-sequence transformations, obviating the need for intermediate steps such as word alignment and language modeling that was used in statistical machine translation (SMT).
| + | ===神经NLP(2010s-至今)=== |
| | | |
− | 二十一世纪一零年代,'''[[表示学习 Representation Learning]]'''和'''[[深度神经网络Deep Neural Network]]'''式的机器学习方法在自然语言处理中得到了广泛的应用,部分原因是一系列的结果表明这些技术可以在许多自然语言任务中获得最先进的结果,比如语言建模、语法分析等。流行的技术包括使用'''[[词嵌入Word Embedding]]'''来获取单词的语义属性,以及增加高级任务的端到端学习(如问答) ,而不是依赖于分立的中间任务流程(如词性标记和依赖性分析)。在某些领域,这种转变使得NLP系统的设计发生了重大变化,因此,基于深层神经网络的方法可以被视为一种有别于统计自然语言处理的新范式。例如,神经机器翻译(neural machine translation,NMT)一词强调了这样一个事实:基于深度学习的机器翻译方法直接学习序列到序列变换,从而避免了统计机器翻译(statistical machine translation,SMT)中使用的词对齐和语言建模等中间步骤。 | + | 二十一世纪一零年代,'''[[表示学习 Representation Learning]]'''和'''[[深度神经网络Deep Neural Network]]'''式的机器学习方法在自然语言处理中得到了广泛的应用,部分原因是一系列的结果表明这些技术可以在许多自然语言任务中获得最先进的结果<ref name=goldberg:nnlp17>{{cite journal |last=Goldberg |first=Yoav |year=2016 |arxiv=1807.10854 |title=A Primer on Neural Network Models for Natural Language Processing |journal=Journal of Artificial Intelligence Research |volume=57 |pages=345–420 |doi=10.1613/jair.4992 }}</ref><ref name=goodfellow:book16>{{cite book |first=Ian |last=Goodfellow |first2=Yoshua |last2=Bengio |first3=Aaron |last3=Courville |url=http://www.deeplearningbook.org/ |title=Deep Learning |location= |publisher=MIT Press |year=2016 |isbn= }}</ref>,比如语言建模、语法分析等<ref name=jozefowicz:lm16>{{cite book |first=Rafal |last=Jozefowicz |first2=Oriol |last2=Vinyals |first3=Mike |last3=Schuster |first4=Noam |last4=Shazeer |first5=Yonghui |last5=Wu |year=2016 |arxiv=1602.02410 |title=Exploring the Limits of Language Modeling |bibcode=2016arXiv160202410J }}</ref><ref name=choe:emnlp16>{{cite journal |first=Do Kook |last=Choe |first2=Eugene |last2=Charniak |journal=Emnlp 2016 |url=https://aclanthology.coli.uni-saarland.de/papers/D16-1257/d16-1257 |title=Parsing as Language Modeling }}</ref><ref name="vinyals:nips15">{{cite journal |last=Vinyals |first=Oriol |last2=Kaiser |first2=Lukasz |displayauthors=1 |journal=Nips2015 |title=Grammar as a Foreign Language |year=2014 |arxiv=1412.7449 |bibcode=2014arXiv1412.7449V |url=https://papers.nips.cc/paper/5635-grammar-as-a-foreign-language.pdf }}</ref>。流行的技术包括使用'''[[词嵌入Word Embedding]]'''来获取单词的语义属性,以及增加高级任务的端到端学习(如问答) ,而不是依赖于分立的中间任务流程(如词性标记和依赖性分析)。在某些领域,这种转变使得NLP系统的设计发生了重大变化,因此,基于深层神经网络的方法可以被视为一种有别于统计自然语言处理的新范式。例如,神经机器翻译(neural machine translation,NMT)一词强调了这样一个事实:基于深度学习的机器翻译方法直接学习序列到序列变换,从而避免了统计机器翻译(statistical machine translation,SMT)中使用的词对齐和语言建模等中间步骤。 |
| | | |
| ==基于规则的NLP vs. 统计NLP (Rule-based vs. statistical NLP{{anchor|Statistical natural language processing (SNLP)}})== | | ==基于规则的NLP vs. 统计NLP (Rule-based vs. statistical NLP{{anchor|Statistical natural language processing (SNLP)}})== |