更改

情感分析 (查看源代码)

2021年8月6日 (五) 20:10的版本

添加310字节、 2021年8月6日 (五) 20:10

无编辑摘要

第139行：第139行：

尽管在大多数统计分类方法中，根据中性文本位于二元分类器边界附近的假设，中性类常常忽略了，但一些研究者建议在每个极性问题中必须确定三个类别。此外，研究也证明引入中立类可以提高某些分类器的整体准确率，如最大熵（Max Entropy）<ref name="Vryniotis13" /> 和支持向量机（SVMs）<ref name="KoppelSchler06" /> 等特定分类器。原则上由两种方法可以进行中性分类。一是，算法首先识别出中性分类后将其过滤，再根据正极性和负极性的情感二分类对其他内容进行评估。二是，一步构建包含中性、正极性和负极性三种类别的分类。<ref>{{Cite journal|last1=Ribeiro|first1=Filipe Nunes|last2=Araujo|first2=Matheus|date=2010|title=A Benchmark Comparison of State-of-the-Practice Sentiment Analysis Methods|url=https://www.researchgate.net/publication/286302059|journal=Transactions on Embedded Computing Systems |volume=9 |issue=4}}</ref> 第二种方法通常会涉及到估计所有类别的概率分布（比如[[Nltk|NLTK]]实现的naive Bayes分类器）。是否以及如何使用中性分类取决于数据的性质：如果数据被清晰地分类为中性、正极性和负极性的语言，那么过滤掉中性语言并关注正极性和负极性情感的极性是有意义的。相比之下，如果数据大部分是中性的，对正极性和负极性影响的偏差很小，这种策略就会使其更难明确区分两极。

−

A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral, or positive sentiment with them are given an associated number on a −10 to +10 scale (most negative up to most positive) or simply from 0 to a positive upper limit such as +4. This makes it possible to adjust the sentiment of a given term relative to its environment (usually on the level of the sentence). When a piece of unstructured text is analyzed using [[natural language processing]], each concept in the specified environment is given a score based on the way sentiment words relate to the concept and its associated score.<ref>{{Cite journal|last1=Taboada|first1=Maite|last2=Brooke|first2=Julian|date=2011|title=Lexicon-based methods for sentiment analysis|url=http://dl.acm.org/citation.cfm?id=2000518|journal=Computational Linguistics |volume=37 |issue=2 |pages=272–274|doi=10.1162/coli_a_00049|citeseerx=10.1.1.188.5517|s2cid=3181362}}</ref><ref>{{Cite journal|last1=Augustyniak|first1=Łukasz|last2=Szymański|first2=Piotr|last3=Kajdanowicz|first3=Tomasz|last4=Tuligłowicz|first4=Włodzimierz|date=2015-12-25|title=Comprehensive Study on Lexicon-based Ensemble Classification Sentiment Analysis|journal=Entropy|language=en|volume=18|issue=1|pages=4|doi=10.3390/e18010004|bibcode=2015Entrp..18....4A|doi-access=free}}</ref><ref>{{Cite journal|last1=Mehmood|first1=Yasir|last2=Balakrishnan|first2=Vimala|date=2020-01-01|title=An enhanced lexicon-based approach for sentiment analysis: a case study on illegal immigration|url=https://doi.org/10.1108/OIR-10-2018-0295|journal=Online Information Review|volume=44|issue=5|pages=1097–1117|doi=10.1108/OIR-10-2018-0295|issn=1468-4527}}</ref> This allows movement to a more sophisticated understanding of sentiment, because it is now possible to adjust the sentiment value of a concept relative to modifications that may surround it. Words, for example, that intensify, relax or negate the sentiment expressed by the concept can affect its score. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text.<ref name ="SentiStrength2010">

+

A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral, or positive sentiment with them are given an associated number on a −10 to +10 scale (most negative up to most positive) or simply from 0 to a positive upper limit such as +4. This makes it possible to adjust the sentiment of a given term relative to its environment (usually on the level of the sentence). When a piece of unstructured text is analyzed using [[natural language processing]], each concept in the specified environment is given a score based on the way sentiment words relate to the concept and its associated score.<ref name=":7">{{Cite journal|last1=Taboada|first1=Maite|last2=Brooke|first2=Julian|date=2011|title=Lexicon-based methods for sentiment analysis|url=http://dl.acm.org/citation.cfm?id=2000518|journal=Computational Linguistics |volume=37 |issue=2 |pages=272–274|doi=10.1162/coli_a_00049|citeseerx=10.1.1.188.5517|s2cid=3181362}}</ref><ref name=":8">{{Cite journal|last1=Augustyniak|first1=Łukasz|last2=Szymański|first2=Piotr|last3=Kajdanowicz|first3=Tomasz|last4=Tuligłowicz|first4=Włodzimierz|date=2015-12-25|title=Comprehensive Study on Lexicon-based Ensemble Classification Sentiment Analysis|journal=Entropy|language=en|volume=18|issue=1|pages=4|doi=10.3390/e18010004|bibcode=2015Entrp..18....4A|doi-access=free}}</ref><ref name=":9">{{Cite journal|last1=Mehmood|first1=Yasir|last2=Balakrishnan|first2=Vimala|date=2020-01-01|title=An enhanced lexicon-based approach for sentiment analysis: a case study on illegal immigration|url=https://doi.org/10.1108/OIR-10-2018-0295|journal=Online Information Review|volume=44|issue=5|pages=1097–1117|doi=10.1108/OIR-10-2018-0295|issn=1468-4527}}</ref>This allows movement to a more sophisticated understanding of sentiment, because it is now possible to adjust the sentiment value of a concept relative to modifications that may surround it. Words, for example, that intensify, relax or negate the sentiment expressed by the concept can affect its score. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text.<ref name ="SentiStrength2010">

{{cite journal

| first1 = Mike

第163行：第163行：

</ref>

−

+

另一种不同的识别情感的方法是使用一个量表系统，在这个系统中负极性、中性和正极性相关的词语被赋予了-10到+10的取值，代表着从最负面到最正面，或者是简单地从0到正面的上限，如+4。这使得我们能够根据环境（通常是在句子语境的层次上）调整特定语言的情感极性程度。当使用自然语言处理对一段非结构化文本进行分析时，基于情感词与概念的关联方式及其相关分数，对指定环境中的每个概念进行评分。<ref name=":7" /><ref name=":8" /><ref name=":9" /> 。这使得人们可以对情感有更深入的理解，因为现在依据相周围可能发生的变化调整一个概念的情感程度，例如，强化、缓和或否定概念所表达的情感的词语会影响它的得分。或者，如果目的是确定文本中的情感而不是文本的整体极性和强度，则可以给文本一个正极性和负极性的情感强度得分。<ref name="SentiStrength2010" />

−

~~另一种确定情绪的方法是使用一种比例系统，根据这种比例系统，通常与负面、中性或正面情绪相关的词在~~ -10到 + ~~10的范围内(大多数从负面到最正面)或从0到正面上限(如~~ + 4)被赋予一个相关的数字。这样就可以根据环境调整特定术语的情绪(通常是在句子的层面上)。当使用自然语言处理对一篇非结构化文本进行分析时，基于情感词与概念及其相关得分的关系，给特定环境中的每个概念打分。这使得人们可以对情绪有更深入的理解，因为现在可以调整一个概念的情绪价值，相对于它周围可能发生的变化。例如，强化、放松或否定概念所表达的情感的词语会影响它的得分。或者，如果文本的目标是确定文本中的情绪，而不是文本的总体极性和强度，那么文本可以给出积极和消极的情绪强度评分。

There are various other types of sentiment analysis like- Aspect Based sentiment analysis, Grading sentiment analysis (positive,negative,neutral), Multilingual sentiment analysis and detection of emotions.

−

~~还有各种其他类型的情感分析，如基于方面的情感分析，分级情感分析~~(积极的，消极的，中性的) ~~，多语言情感分析和情感检测。~~

+

还有各种其他类型的情感分析，如功能/属性为基础的情感分析、分级情感分析(积极的，消极的，中性的) 、多语言情感分析和情感识别。

−

=== Subjectivity/objectivity identification ===

+

=== Subjectivity/objectivity identification 主观性/客观性识别 ===

This task is commonly defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective.<ref name="PangLee08Subjectivity">{{cite book

| first1 = Bo

第217行：第216行：

</ref> showed that removing objective sentences from a document before classifying its polarity helped improve performance.

−

这个任务通常被定义为将一个给定的文本(通常是一个句子)分成两类: 客观的或主观的。这个问题有时比极性分类更难解决。词汇和短语的主观性可能取决于它们的上下文，客观文件可能包含主观句子(例如，一篇引用人们观点的新闻文章)。此外，正如苏所提到的，结果在很大程度上依赖于注释文本时使用的主观性的定义。然而，彭日成表示，在对文件进行分类之前，去掉文件中的客观句子有助于提高表现。

+

这一任务被普遍地定义为将给定的文本识别为主观和客观两个类别。<ref name="PangLee08Subjectivity" /> 这个问题有时甚至比极性分类更加难以解决。<ref name="MihalceaAl07" /> 词或短语的主观性取决于特定的上下文语境，客观的篇章有时候又包含了主观的句子（比如，一篇新闻中引用了其他人的观点）。此外，正如Su<ref name="SuMarkert08" /> 所提到的，结果在很大程度上依赖于注释文本时使用的主观性的定义。然而，Pang<ref name="PangLee04" /> 的研究表明，在对篇章文本进行极性分类之前去掉文本中的客观句子有助于提高模型的表现。

+

Kuangmy

54

个编辑

更改

情感分析 (查看源代码)

2021年8月6日 (五) 20:10的版本

导航菜单

搜索