更改

情感分析 (查看源代码)

2021年8月9日 (一) 15:50的版本

删除54字节、 2021年8月9日 (一) 15:50

小

正极性-->正面etc

第53行：第53行：

for Vietnamese Social Media Text". In Proceedings of the 2019 International Conference of the Pacific Association for Computational Linguistics (PACLING 2019), Hanoi, Vietnam (2019).</ref>

−

情感分析的最底层的任务是识别给定的情感评论文本中的极性倾向是正极性、负极性还是中性的。按照处理文本的粒度不同，情感分析可以分为篇章级、句子级和词语级三个研究层次。高级的“超极性”情感分类研究关注有如情绪状态等，如享受、愤怒、厌恶、悲伤、恐惧和惊讶。<ref name=":2" />

+

情感分析的最底层的任务是识别给定的情感评论文本中的极性倾向是正面的、负面的还是中性的。按照处理文本的粒度不同，情感分析可以分为篇章级、句子级和词语级三个研究层次。高级的“超极性”情感分类研究关注有如情绪状态等，如享受、愤怒、厌恶、悲伤、恐惧和惊讶。<ref name=":2" />

Precursors to sentimental analysis include the General Inquirer,<ref name=":3">Stone, Philip J., Dexter C. Dunphy, and Marshall S. Smith. "The general inquirer: A computer approach to content analysis." MIT Press, Cambridge, MA (1966).</ref> which provided hints toward quantifying patterns in text and, separately, psychological research that examined a person's [[psychological state]] based on analysis of their verbal behavior.<ref name=":4">Gottschalk, Louis August, and Goldine C. Gleser. The measurement of psychological states through the content analysis of verbal behavior. Univ of California Press, 1969.</ref>

第112行：第112行：

</ref> among others: Pang and Lee<ref name = "PangLee05" /> expanded the basic task of classifying a movie review as either positive or negative to predict star ratings on either a 3- or a 4-star scale, while Snyder<ref name = "SnyderBarzilay07" /> performed an in-depth analysis of restaurant reviews, predicting ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-star scale).

−

~~之后许多的研究都没有那么复杂，仅仅使用了正负两极的情感极性视角，比如Turney~~<ref name="Turney02" />和Pang<ref name="PangAl02" />分别使用了不同的方法来识别产品评论和电影评论的极性。这项工作是在篇章级的粒度层次进行的。人们还可以在多层次上对篇章的极性进行分类，Pang<ref name="PangLee05" />和Snyder<ref name="SnyderBarzilay07" /> 等人曾尝试这样做：Pang和Lee<ref name="PangLee05" />拓展了仅仅将电影评论分为正面或负面的基本任务，并以三星或四星的尺度预测电影的评级；而Snyder<ref name="SnyderBarzilay07" /> 对餐馆评论进行了深入分析，预测特定餐馆的各个方面的评级，例如食物和氛围(以五星的尺度)。

+

之后许多的研究都没有那么复杂，仅仅使用了正负面的情感极性视角，比如Turney<ref name="Turney02" />和Pang<ref name="PangAl02" />分别使用了不同的方法来识别产品评论和电影评论的极性。这项工作是在篇章级的粒度层次进行的。人们还可以在多层次上对篇章的极性进行分类，Pang<ref name="PangLee05" />和Snyder<ref name="SnyderBarzilay07" /> 等人曾尝试这样做：Pang和Lee<ref name="PangLee05" />拓展了仅仅将电影评论分为正面或负面的基本任务，并以三星或四星的尺度预测电影的评级；而Snyder<ref name="SnyderBarzilay07" /> 对餐馆评论进行了深入分析，预测特定餐馆的各个方面的评级，例如食物和氛围(以五星的尺度)。

First steps to bringing together various approaches—learning, lexical, knowledge-based, etc.—were taken in the 2004 [[AAAI]] Spring Symposium where linguists, computer scientists, and other interested researchers first aligned interests and proposed shared tasks and benchmark data sets for the systematic computational research on affect, appeal, subjectivity, and sentiment in text.<ref name=":6">Qu, Yan, James Shanahan, and Janyce Wiebe. "Exploring attitude and affect in text: Theories and applications." In AAAI Spring Symposium) Technical report SS-04-07. AAAI Press, Menlo Park, CA. 2004.</ref>

第137行：第137行：

</ref> can benefit from the introduction of a neutral class and improve the overall accuracy of the classification. There are in principle two ways for operating with a neutral class. Either, the algorithm proceeds by first identifying the neutral language, filtering it out and then assessing the rest in terms of positive and negative sentiments, or it builds a three-way classification in one step.This second approach often involves estimating a probability distribution over all categories (e.g. [[Naive Bayes classifier|naive Bayes]] classifiers as implemented by the [[Nltk|NLTK]]).Whether and how to use a neutral class depends on the nature of the data: if the data is clearly clustered into neutral, negative and positive language, it makes sense to filter the neutral language out and focus on the polarity between positive and negative sentiments. If, in contrast, the data are mostly neutral with small deviations towards positive and negative affect, this strategy would make it harder to clearly distinguish between the two poles.

−

尽管在大多数统计分类方法中，根据中性文本位于二元分类器边界附近的假设，中性类常常忽略了，但一些研究者建议在每个极性问题中必须确定三个类别。此外，研究也证明引入中立类可以提高某些分类器的整体准确率，如最大熵（Max Entropy）<ref name="Vryniotis13" /> 和支持向量机（SVMs）<ref name="KoppelSchler06" /> 等特定分类器。原则上由两种方法可以进行中性分类。一是，算法首先识别出中性分类后将其过滤，再根据正极性和负极性的情感二分类对其他内容进行评估。二是，一步构建包含中性、正极性和负极性三种类别的分类。<ref>{{Cite journal|last1=Ribeiro|first1=Filipe Nunes|last2=Araujo|first2=Matheus|date=2010|title=A Benchmark Comparison of State-of-the-Practice Sentiment Analysis Methods|url=https://www.researchgate.net/publication/286302059|journal=Transactions on Embedded Computing Systems |volume=9 |issue=4}}</ref> 第二种方法通常会涉及到估计所有类别的概率分布（比如[[Nltk|NLTK]]实现的naive Bayes分类器）。是否以及如何使用中性分类取决于数据的性质：如果数据被清晰地分类为中性、正极性和负极性的语言，那么过滤掉中性语言并关注正极性和负极性情感的极性是有意义的。相比之下，如果数据大部分是中性的，对正极性和负极性影响的偏差很小，这种策略就会使其更难明确区分两极。

+

尽管在大多数统计分类方法中，根据中性文本位于二元分类器边界附近的假设，中性类常常忽略了，但一些研究者建议在每个极性问题中必须确定三个类别。此外，研究也证明引入中立类可以提高某些分类器的整体准确率，如最大熵（Max Entropy）<ref name="Vryniotis13" /> 和支持向量机（SVMs）<ref name="KoppelSchler06" /> 等特定分类器。原则上由两种方法可以进行中性分类。一是，算法首先识别出中性分类后将其过滤，再根据正面和负面的情感二分类对其他内容进行评估。二是，一步构建包含中性、正面和负面三种类别的分类。<ref>{{Cite journal|last1=Ribeiro|first1=Filipe Nunes|last2=Araujo|first2=Matheus|date=2010|title=A Benchmark Comparison of State-of-the-Practice Sentiment Analysis Methods|url=https://www.researchgate.net/publication/286302059|journal=Transactions on Embedded Computing Systems |volume=9 |issue=4}}</ref> 第二种方法通常会涉及到估计所有类别的概率分布（比如[[Nltk|NLTK]]实现的naive Bayes分类器）。是否以及如何使用中性分类取决于数据的性质：如果数据被清晰地分类为中性、正面和负面的语言，那么过滤掉中性语言并关注正面和负面情感的极性是有意义的。相比之下，如果数据大部分是中性的，对正面和负面影响的偏差很小，这种策略就会使其更难明确区分两极。

A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral, or positive sentiment with them are given an associated number on a −10 to +10 scale (most negative up to most positive) or simply from 0 to a positive upper limit such as +4. This makes it possible to adjust the sentiment of a given term relative to its environment (usually on the level of the sentence). When a piece of unstructured text is analyzed using [[natural language processing]], each concept in the specified environment is given a score based on the way sentiment words relate to the concept and its associated score.<ref name=":7">{{Cite journal|last1=Taboada|first1=Maite|last2=Brooke|first2=Julian|date=2011|title=Lexicon-based methods for sentiment analysis|url=http://dl.acm.org/citation.cfm?id=2000518|journal=Computational Linguistics |volume=37 |issue=2 |pages=272–274|doi=10.1162/coli_a_00049|citeseerx=10.1.1.188.5517|s2cid=3181362}}</ref><ref name=":8">{{Cite journal|last1=Augustyniak|first1=Łukasz|last2=Szymański|first2=Piotr|last3=Kajdanowicz|first3=Tomasz|last4=Tuligłowicz|first4=Włodzimierz|date=2015-12-25|title=Comprehensive Study on Lexicon-based Ensemble Classification Sentiment Analysis|journal=Entropy|language=en|volume=18|issue=1|pages=4|doi=10.3390/e18010004|bibcode=2015Entrp..18....4A|doi-access=free}}</ref><ref name=":9">{{Cite journal|last1=Mehmood|first1=Yasir|last2=Balakrishnan|first2=Vimala|date=2020-01-01|title=An enhanced lexicon-based approach for sentiment analysis: a case study on illegal immigration|url=https://doi.org/10.1108/OIR-10-2018-0295|journal=Online Information Review|volume=44|issue=5|pages=1097–1117|doi=10.1108/OIR-10-2018-0295|issn=1468-4527}}</ref>This allows movement to a more sophisticated understanding of sentiment, because it is now possible to adjust the sentiment value of a concept relative to modifications that may surround it. Words, for example, that intensify, relax or negate the sentiment expressed by the concept can affect its score. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text.<ref name ="SentiStrength2010">

第163行：第163行：

</ref>

−

~~另一种不同的识别情感的方法是使用一个量表系统，在这个系统中负极性、中性和正极性相关的词语被赋予了~~-10到+10的取值，代表着从最负面到最正面，或者是简单地从0到正面的上限，如+4。这使得我们能够根据环境（通常是在句子语境的层次上）调整特定语言的情感极性程度。当使用自然语言处理对一段非结构化文本进行分析时，基于情感词与概念的关联方式及其相关分数，对指定环境中的每个概念进行评分。<ref name=":7" /><ref name=":8" /><ref name=":9" /> 。这使得人们可以对情感有更深入的理解，因为现在依据相周围可能发生的变化调整一个概念的情感程度，例如，强化、缓和或否定概念所表达的情感的词语会影响它的得分。或者，如果目的是确定文本中的情感而不是文本的整体极性和强度，则可以给文本一个正极性和负极性的情感强度得分。<ref name="SentiStrength2010" />

+

另一种不同的识别情感的方法是使用一个量表系统，在这个系统中负面、中性和正面相关的词语被赋予了-10到+10的取值，代表着从最负面到最正面，或者是简单地从0到正面的上限，如+4。这使得我们能够根据环境（通常是在句子语境的层次上）调整特定语言的情感极性程度。当使用自然语言处理对一段非结构化文本进行分析时，基于情感词与概念的关联方式及其相关分数，对指定环境中的每个概念进行评分。<ref name=":7" /><ref name=":8" /><ref name=":9" /> 。这使得人们可以对情感有更深入的理解，因为现在依据相周围可能发生的变化调整一个概念的情感程度，例如，强化、缓和或否定概念所表达的情感的词语会影响它的得分。或者，如果目的是确定文本中的情感而不是文本的整体极性和强度，则可以给文本一个正面和负面的情感强度得分。<ref name="SentiStrength2010" />

There are various other types of sentiment analysis like- Aspect Based sentiment analysis, Grading sentiment analysis (positive,negative,neutral), Multilingual sentiment analysis and detection of emotions.

−

还有各种其他类型的情感分析，如功能/属性为基础的情感分析、分级情感分析(~~积极的，消极的，中性的~~) 、多语言情感分析和情感识别。

+

还有各种其他类型的情感分析，如功能/属性为基础的情感分析、分级情感分析(正面、负面、中性) 、多语言情感分析和情感识别。

=== Subjectivity/objectivity identification 主观性/客观性识别 ===

第230行：第230行：

The term subjective describes the incident contains non-factual information in various forms, such as personal opinions, judgment, and predictions. Also known as 'private states' mentioned by Quirk et al.<ref name=":10">{{Cite book|last1=Quirk|first1=Randolph|title=A Comprehensive Grammar of the English Language (General Grammar)|last2=Greenbaum|first2=Sidney|last3=Geoffrey|first3=Leech|last4=Jan|first4=Svartvik|publisher=[[Longman]]|year=1985|isbn=1933108312|pages=175–239}}</ref> In the example down below, it reflects a private states 'We Americans'.  Moreover, the target entity commented by the opinions can take several forms from tangible product to intangible topic matters stated in Liu(2010).<ref name="Liu2010" /> Furthermore, three types of attitudes were observed by Liu(2010), 1) positive opinions, 2) neutral opinions, and 3)negative opinions.<ref name="Liu2010" />

−

主观这个术语描述的事件包含各种形式的非事实信息，如个人意见、判断和预测。也被Quirk等人称为“私人状况（private states）”。<ref name=":10" /> 在下面的例子中，它反映了“我们美国人”这样一个私人状态。此外，被评论的目标实体可以是从有形到无形的话题事项等多种形式（Liu,2010）。<ref name="Liu2010" /> 此外，刘（2010）还观察到三种类型的态度: 1)~~积极的观点，2~~)中性的观点，3)~~消极的观点。~~<ref name="Liu2010" />

+

主观这个术语描述的事件包含各种形式的非事实信息，如个人意见、判断和预测。也被Quirk等人称为“私人状况（private states）”。<ref name=":10" /> 在下面的例子中，它反映了“我们美国人”这样一个私人状态。此外，被评论的目标实体可以是从有形到无形的话题事项等多种形式（Liu,2010）。<ref name="Liu2010" /> 此外，刘（2010）还观察到三种类型的态度: 1)正面的观点，2)中性的观点，3)负面的观点。<ref name="Liu2010" />

* Example of a subjective sentence: 'We Americans need to elect a president who is mature and who is able to make wise decisions.'

Kuangmy

54

个编辑