更改

情感分析 (查看源代码)

2021年8月3日 (二) 21:04的版本

删除30,741字节、 2021年8月3日 (二) 21:04

无编辑摘要

第4行：第4行：

'''Sentiment analysis''' (also known as '''opinion mining''' or '''emotion AI''') is the use of [[natural language processing]], [[Text analytics|text analysis]], [[computational linguistics]], and [[biometrics]] to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to [[voice of the customer]] materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from [[marketing]] to [[Customer relationship management|customer service]] to clinical medicine.

−

Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.

情感分析(又称意见挖掘或情感人工智能)是利用自然语言处理、文本分析、计算语言学分析和生物特征识别技术系统地识别、提取、量化和研究情感状态和主观信息。情感分析被广泛应用于客户材料的声音，如评论和调查回应，在线和社交媒体，以及从市场营销到客户服务到临床医学的各种应用的医疗材料。

== Examples ==

−

~~The objective and challenges of sentiment analysis can be shown through some simple examples.~~

−

The objective and challenges of sentiment analysis can be shown through some simple examples.

情感分析的目的和挑战可以通过一些简单的例子来说明。

−

=== Simple ~~cases ===~~

+

= = = Simple cases简单案例 = =

−

~~=== Simple cases ===~~

−

~~= = = 简单案例~~ = =

* Coronet has the best lines of all day cruisers.

第26行：第18行：

* Pastel-colored 1980s day cruisers from Florida are ugly.

* I dislike old [[cabin cruiser]]s.

−

* Coronet has the best lines of all day cruisers.

−

* Bertram has a deep V hull and runs easily through seas.

−

* Pastel-colored 1980s day cruisers from Florida are ugly.

−

* I dislike old cabin cruisers.

第38行：第25行：

* 我不喜欢旧的游艇。

−

=== More challenging ~~examples ===~~

+

= = More challenging examples更具挑战性的例子 = =

−

~~=== More challenging examples ===~~

−

~~= = 更具挑战性的例子~~ = =

* I do not dislike cabin cruisers. ([[Negation]] handling)

第53行：第36行：

* You should see their decadent dessert menu. (Attitudinal term has shifted polarity recently in certain domains)

* I love my mobile but would not recommend it to any of my colleagues. (Qualified positive sentiment, difficult to categorise)

−

* Next week's gig will be right koide9! ("Quoi de neuf?", French for "what's new?". Newly minted terms can be highly attitudinal but volatile in polarity and often out of known vocabulary.)

−

* I do not dislike cabin cruisers. (Negation handling)

−

* Disliking watercraft is not really my thing. (Negation, inverted word order)

−

* Sometimes I really hate RIBs. (Adverbial modifies the sentiment)

−

* I'd really truly love going out in this weather! (Possibly sarcastic)

−

* Chris Craft is better looking than Limestone. (Two brand names, identifying the target of attitude is difficult).

−

* Chris Craft is better looking than Limestone, but Limestone projects seaworthiness and reliability. (Two attitudes, two brand names).

−

* The movie is surprising with plenty of unsettling plot twists. (Negative term used in a positive sense in certain domains).

−

* You should see their decadent dessert menu. (Attitudinal term has shifted polarity recently in certain domains)

−

* I love my mobile but would not recommend it to any of my colleagues. (Qualified positive sentiment, difficult to categorise)

−

* Next week's gig will be right koide9! ("Quoi de neuf?", French for "what's new?". Newly minted terms can be highly attitudinal but volatile in polarity and often out of known vocabulary.)

第78行：第49行：

== Types ==

−

~~== Types ==~~

−

~~= = = =~~

A basic task in sentiment analysis is classifying the ''polarity'' of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as enjoyment, anger, disgust, sadness, fear, and surprise. <ref> Vong Anh Ho, Duong Huynh-Cong Nguyen, Danh Hoang Nguyen, Linh Thi-Van Pham, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen. "Emotion Recognition

for Vietnamese Social Media Text". In Proceedings of the 2019 International Conference of the Pacific Association for Computational Linguistics (PACLING 2019), Hanoi, Vietnam (2019).</ref>

−

A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as enjoyment, anger, disgust, sadness, fear, and surprise. Vong Anh Ho, Duong Huynh-Cong Nguyen, Danh Hoang Nguyen, Linh Thi-Van Pham, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen. "Emotion Recognition

−

~~for Vietnamese Social Media Text". In Proceedings of the 2019 International Conference of the Pacific Association for Computational Linguistics (PACLING 2019), Hanoi, Vietnam (2019).~~

情感分析的一个基本任务就是在文档、句子或者特征/方面层面上对给定文本的极性进行分类ーー文档、句子或者实体特征/方面表达的意见是正面的、负面的还是中性的。先进的“超极性”情绪分类研究，例如，在情绪状态，如享受，愤怒，厌恶，悲伤，恐惧，和惊讶。Vong Anh Ho, Duong Huynh-Cong Nguyen, Danh Hoang Nguyen, Linh Thi-Van Pham, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen.「越南社交媒体文字的情绪认知」。在《2019年太平洋计算机语言学协会国际会议论文集》(PACLING 2019)中，越南河内(2019)。

Precursors to sentimental analysis include the General Inquirer,<ref>Stone, Philip J., Dexter C. Dunphy, and Marshall S. Smith. "The general inquirer: A computer approach to content analysis." MIT Press, Cambridge, MA (1966).</ref> which provided hints toward quantifying patterns in text and, separately, psychological research that examined a person's [[psychological state]] based on analysis of their verbal behavior.<ref>Gottschalk, Louis August, and Goldine C. Gleser. The measurement of psychological states through the content analysis of verbal behavior. Univ of California Press, 1969.</ref>

−

Precursors to sentimental analysis include the General Inquirer,Stone, Philip J., Dexter C. Dunphy, and Marshall S. Smith. "The general inquirer: A computer approach to content analysis." MIT Press, Cambridge, MA (1966). which provided hints toward quantifying patterns in text and, separately, psychological research that examined a person's psychological state based on analysis of their verbal behavior.Gottschalk, Louis August, and Goldine C. Gleser. The measurement of psychological states through the content analysis of verbal behavior. Univ of California Press, 1969.

情感分析的先驱包括总询问者，斯通，菲利普 j. ，德克斯特 c. 邓菲，和马歇尔 s. 史密斯。一般询问者: 内容分析的计算机方法麻省理工学院出版社，剑桥，麻省理工学院(1966)。这为文本中的量化模式提供了线索，另外还有心理学研究，通过分析一个人的言语行为来检验他的心理状态。戈特沙尔克，路易斯 · 奥古斯特，戈尔丁 · c · 格莱泽。通过言语行为的内容分析测量心理状态。加州大学出版社，1969年。

第114行：第76行：

| url = http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=2&p=1&f=G&l=50&d=PTXT&S1=(fogel.INNM.+AND+volcani.INNM.)&OS=in/fogel+and+in/volcani&RS=(IN/fogel+AND+IN/volcani)

}}</ref> looked specifically at sentiment and identified individual words and phrases in text with respect to different emotional scales. A current system based on their work, called EffectCheck, presents synonyms that can be used to increase or decrease the level of evoked emotion in each scale.

−

Subsequently, the method described in a patent by Volcani and Fogel, looked specifically at sentiment and identified individual words and phrases in text with respect to different emotional scales. A current system based on their work, called EffectCheck, presents synonyms that can be used to increase or decrease the level of evoked emotion in each scale.

随后，火山和福格尔在一项专利中描述的方法，专门研究了情感，并根据不同的情感尺度识别了文本中的单个单词和短语。一个基于他们的工作的现行系统，称为 EffectCheck，提出了同义词，可以用来增加或减少在每个规模的诱发情绪的水平。

第151行：第111行：

}}

</ref> among others: Pang and Lee<ref name = "PangLee05" /> expanded the basic task of classifying a movie review as either positive or negative to predict star ratings on either a 3- or a 4-star scale, while Snyder<ref name = "SnyderBarzilay07" /> performed an in-depth analysis of restaurant reviews, predicting ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-star scale).

−

~~Many other subsequent efforts were less sophisticated, using a mere polar view of sentiment, from positive to negative, such as work by Turney, and Pang~~

−

who applied different methods for detecting the polarity of product reviews and movie reviews respectively. This work is at the document level. One can also classify a document's polarity on a multi-way scale, which was attempted by Pang

−

~~and Snyder~~

−

among others: Pang and Lee expanded the basic task of classifying a movie review as either positive or negative to predict star ratings on either a 3- or a 4-star scale, while Snyder performed an in-depth analysis of restaurant reviews, predicting ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-star scale).

随后的许多努力都没有那么复杂，仅仅使用了从正面到负面的情绪极性视角，比如特尼和彭日成分别使用了不同的方法来检测产品评论和电影评论的极性。这项工作是在文档级别进行的。人们还可以在多方面的尺度上对文件的极性进行分类，彭日成和斯奈德等人曾尝试这样做: 彭日成和李拓展了将电影评论分为正面或负面的基本任务，以3星或4星的尺度预测明星评级，而斯奈德对餐馆评论进行了深入分析，预测特定餐馆的各个方面的评级，例如食物和氛围(以五星的尺度)。

First steps to bringing together various approaches—learning, lexical, knowledge-based, etc.—were taken in the 2004 [[AAAI]] Spring Symposium where linguists, computer scientists, and other interested researchers first aligned interests and proposed shared tasks and benchmark data sets for the systematic computational research on affect, appeal, subjectivity, and sentiment in text.<ref>Qu, Yan, James Shanahan, and Janyce Wiebe. "Exploring attitude and affect in text: Theories and applications." In AAAI Spring Symposium) Technical report SS-04-07. AAAI Press, Menlo Park, CA. 2004.</ref>

−

First steps to bringing together various approaches—learning, lexical, knowledge-based, etc.—were taken in the 2004 AAAI Spring Symposium where linguists, computer scientists, and other interested researchers first aligned interests and proposed shared tasks and benchmark data sets for the systematic computational research on affect, appeal, subjectivity, and sentiment in text.Qu, Yan, James Shanahan, and Janyce Wiebe. "Exploring attitude and affect in text: Theories and applications." In AAAI Spring Symposium) Technical report SS-04-07. AAAI Press, Menlo Park, CA. 2004.

2004年美国科学促进会春季研讨会上，语言学家、计算机科学家和其他感兴趣的研究人员首次将各种方法——学习、词汇、基于知识等——结合起来，提出了共享任务和基准数据集，用于对文本 qu、 Yan、 James shanawe 和 janice wice 的影响、吸引力、主观性和情感的系统计算研究。探索文本中的态度和情感: 理论和应用在 AAAI 春季研讨会)技术报告 SS-04-07。美国科学促进协会出版社，门洛帕克。2004.

第186行：第136行：

}}

</ref> can benefit from the introduction of a neutral class and improve the overall accuracy of the classification. There are in principle two ways for operating with a neutral class. Either, the algorithm proceeds by first identifying the neutral language, filtering it out and then assessing the rest in terms of positive and negative sentiments, or it builds a three-way classification in one step.<ref>{{Cite journal|last1=Ribeiro|first1=Filipe Nunes|last2=Araujo|first2=Matheus|date=2010|title=A Benchmark Comparison of State-of-the-Practice Sentiment Analysis Methods|url=https://www.researchgate.net/publication/286302059|journal=Transactions on Embedded Computing Systems |volume=9 |issue=4}}</ref> This second approach often involves estimating a probability distribution over all categories (e.g. [[Naive Bayes classifier|naive Bayes]] classifiers as implemented by the [[Nltk|NLTK]]). Whether and how to use a neutral class depends on the nature of the data: if the data is clearly clustered into neutral, negative and positive language, it makes sense to filter the neutral language out and focus on the polarity between positive and negative sentiments. If, in contrast, the data are mostly neutral with small deviations towards positive and negative affect, this strategy would make it harder to clearly distinguish between the two poles.

−

Even though in most statistical classification methods, the neutral class is ignored under the assumption that neutral texts lie near the boundary of the binary classifier, several researchers suggest that, as in every polarity problem, three categories must be identified. Moreover, it can be proven that specific classifiers such as the Max Entropy

−

~~and SVMs~~

−

can benefit from the introduction of a neutral class and improve the overall accuracy of the classification. There are in principle two ways for operating with a neutral class. Either, the algorithm proceeds by first identifying the neutral language, filtering it out and then assessing the rest in terms of positive and negative sentiments, or it builds a three-way classification in one step. This second approach often involves estimating a probability distribution over all categories (e.g. naive Bayes classifiers as implemented by the NLTK). Whether and how to use a neutral class depends on the nature of the data: if the data is clearly clustered into neutral, negative and positive language, it makes sense to filter the neutral language out and focus on the polarity between positive and negative sentiments. If, in contrast, the data are mostly neutral with small deviations towards positive and negative affect, this strategy would make it harder to clearly distinguish between the two poles.

尽管在大多数分类分类方法中，中性类在假设二进制分类器的边界附近被忽略，一些研究人员建议，在每个极性问题中，必须识别3个类别。此外，还证明了引入中性分类器可以有效地提高分类器的整体准确率，如最大熵和支持向量机。原则上有两种操作中立类的方法。要么，算法首先识别中性语言，过滤掉它，然后根据积极和消极情绪评估其余的语言，要么在一个步骤中建立一个三向分类。第二种方法通常涉及到对所有类别的概率分布进行估计(例如:。NLTK 实现的朴素贝叶斯分类器)。是否以及如何使用中性类取决于数据的性质: 如果数据被清晰地分类为中性、消极和积极的语言，那么过滤掉中性语言并关注积极和消极情绪之间的极性是有意义的。相比之下，如果数据大部分是中性的，对积极和消极影响的偏差很小，这种策略将使清楚地区分两极变得更加困难。

第219行：第163行：

</ref>

−

A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral, or positive sentiment with them are given an associated number on a −10 to +10 scale (most negative up to most positive) or simply from 0 to a positive upper limit such as +4. This makes it possible to adjust the sentiment of a given term relative to its environment (usually on the level of the sentence). When a piece of unstructured text is analyzed using natural language processing, each concept in the specified environment is given a score based on the way sentiment words relate to the concept and its associated score. This allows movement to a more sophisticated understanding of sentiment, because it is now possible to adjust the sentiment value of a concept relative to modifications that may surround it. Words, for example, that intensify, relax or negate the sentiment expressed by the concept can affect its score. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text.

另一种确定情绪的方法是使用一种比例系统，根据这种比例系统，通常与负面、中性或正面情绪相关的词在 -10到 + 10的范围内(大多数从负面到最正面)或从0到正面上限(如 + 4)被赋予一个相关的数字。这样就可以根据环境调整特定术语的情绪(通常是在句子的层面上)。当使用自然语言处理对一篇非结构化文本进行分析时，基于情感词与概念及其相关得分的关系，给特定环境中的每个概念打分。这使得人们可以对情绪有更深入的理解，因为现在可以调整一个概念的情绪价值，相对于它周围可能发生的变化。例如，强化、放松或否定概念所表达的情感的词语会影响它的得分。或者，如果文本的目标是确定文本中的情绪，而不是文本的总体极性和强度，那么文本可以给出积极和消极的情绪强度评分。

−

There are various other types of sentiment analysis like- Aspect Based sentiment analysis, Grading sentiment analysis (positive,negative,neutral), Multilingual sentiment analysis and detection of emotions.

第277行：第218行：

}}

</ref> showed that removing objective sentences from a document before classifying its polarity helped improve performance.

−

~~This task is commonly defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective.~~

−

~~This problem can sometimes be more difficult than polarity classification.~~

−

The subjectivity of words and phrases may depend on their context and an objective document may contain subjective sentences (e.g., a news article quoting people's opinions). Moreover, as mentioned by Su,

−

~~results are largely dependent on the definition of subjectivity used when annotating texts. However, Pang~~

−

~~showed that removing objective sentences from a document before classifying its polarity helped improve performance.~~

这个任务通常被定义为将一个给定的文本(通常是一个句子)分成两类: 客观的或主观的。这个问题有时比极性分类更难解决。词汇和短语的主观性可能取决于它们的上下文，客观文件可能包含主观句子(例如，一篇引用人们观点的新闻文章)。此外，正如苏所提到的，结果在很大程度上依赖于注释文本时使用的主观性的定义。然而，彭日成表示，在对文件进行分类之前，去掉文件中的客观句子有助于提高表现。

−

{{clarify-span|Subjective and objective identification, emerging subtasks of sentiment analysis to use syntactic, semantic features, and machine learning knowledge to identify a sentence or document are facts or opinions. Awareness of recognizing factual and opinions is not recent, having possibly first presented by Carbonell at Yale University in 1979.|date=December 2020}}

The term objective refers to the incident carry factual information.<ref name="Wiebe 2005 486–497">{{Cite journal|last1=Wiebe|first1=Janyce|last2=Riloff|first2=Ellen|date=2005|editor-last=Gelbukh|editor-first=Alexander|title=Creating Subjective and Objective Sentence Classifiers from Unannotated Texts|url=https://link.springer.com/chapter/10.1007%2F978-3-540-30586-6_53|journal=Computational Linguistics and Intelligent Text Processing|series=Lecture Notes in Computer Science|volume=3406|language=en|location=Berlin, Heidelberg|publisher=Springer|pages=486–497|doi=10.1007/978-3-540-30586-6_53|isbn=978-3-540-30586-6}}</ref>

−

~~The term objective refers to the incident carry factual information.~~

客观这个术语指的是携带事实信息的事件。

第296行：第228行：

* Example of an objective sentence: 'To be elected president of the United States, a candidate must be at least thirty-five years of age.'

−

* ~~Example of an objective sentence~~: '~~To be elected president of the United States, a candidate must be at least thirty-five years of age.'~~

+

* 客观句的例子:”要当选美国总统，候选人必须年满35岁。'

−

* 客观句的例子:”要当选美国总统，候选人必须年满35岁。'

The term subjective describes the incident contains non-factual information in various forms, such as personal opinions, judgment, and predictions.Also known as 'private states' mentioned by Quirk et al.<ref>{{Cite book|last1=Quirk|first1=Randolph|title=A Comprehensive Grammar of the English Language (General Grammar)|last2=Greenbaum|first2=Sidney|last3=Geoffrey|first3=Leech|last4=Jan|first4=Svartvik|publisher=[[Longman]]|year=1985|isbn=1933108312|pages=175–239}}</ref> In the example down below, it reflects a private states 'We Americans'.  Moreover, the target entity commented by the opinions can take several forms from tangible product to intangible topic matters stated in Liu(2010).<ref name="Liu2010" /> Furthermore, three types of attitudes were observed by Liu(2010), 1) positive opinions, 2) neutral opinions, and 3)negative opinions.<ref name="Liu2010" />

−

The term subjective describes the incident contains non-factual information in various forms, such as personal opinions, judgment, and predictions.Also known as 'private states' mentioned by Quirk et al. In the example down below, it reflects a private states 'We Americans'. Moreover, the target entity commented by the opinions can take several forms from tangible product to intangible topic matters stated in Liu(2010). Furthermore, three types of attitudes were observed by Liu(2010), 1) positive opinions, 2) neutral opinions, and 3)negative opinions.

主观这个术语描述的事件包含各种形式的非事实信息，如个人意见、判断和预测。也被称为私有状态。在下面的例子中，它反映了一个私人国家“我们美国人”。此外，被评论的目标实体可以采取从有形产品到刘(2010)所述无形话题事项的多种形式。此外，刘(2010)观察到三种态度: 1)积极的观点，2)中立的观点，3)消极的观点。

* Example of a subjective sentence: 'We Americans need to elect a president who is mature and who is able to make wise decisions.'

−

This analysis is a classification problem.<ref name=":1" />

+

This analysis is a classification problem.<ref name=":1" />

−

* Example of a subjective sentence: 'We Americans need to elect a president who is mature and who is able to make wise decisions.'

−

~~This analysis is a classification problem.~~

−

* 主观句的例子: 我们美国人需要选出一位成熟且能够做出明智决定的总统。'这种分析是一个分类问题。

−

Each class's collections of words or phrase indicators are defined for to locate desirable patterns on unannotated text. For subjective expression, a different word list has been created. Lists of subjective indicators in words or phrases have been developed by multiple researchers in the linguist and natural language processing field states in Riloff et al.(2003).<ref>{{Cite journal|last1=Riloff|first1=Ellen|last2=Wiebe|first2=Janyce|date=2003-07-11|title=Learning extraction patterns for subjective expressions|journal=Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing|series=EMNLP '03|volume=10|location=USA|publisher=Association for Computational Linguistics|pages=105–112|doi=10.3115/1119355.1119369|s2cid=6541910|doi-access=free}}</ref> A dictionary of extraction rules has to be created for measuring given expressions. Over the years, in subjective detection, the features extraction progression from curating features by hands in 1999 to automated features learning in 2005.<ref>{{Cite journal|last1=Chaturvedi|first1=Iti|last2=Cambria|first2=Erik|last3=Welsch|first3=Roy E.|last4=Herrera|first4=Francisco|date=November 2018|title=Distinguishing between facts and opinions for sentiment analysis: Survey and challenges|url=https://sentic.net/subjectivity-detection.pdf|journal=Information Fusion|volume=44|pages=65–77|doi=10.1016/j.inffus.2017.12.006|via=Elsevier Science Direct|doi-access=free}}</ref> At the moment, automated learning methods can further separate into supervised and [[Unsupervised learning|unsupervised machine learning]]. Patterns extraction with machine learning process annotated and unannotated text have been explored extensively by academic researchers.

−

Each class's collections of words or phrase indicators are defined for to locate desirable patterns on unannotated text. For subjective expression, a different word list has been created. Lists of subjective indicators in words or phrases have been developed by multiple researchers in the linguist and natural language processing field states in Riloff et al.(2003). A dictionary of extraction rules has to be created for measuring given expressions. Over the years, in subjective detection, the features extraction progression from curating features by hands in 1999 to automated features learning in 2005. At the moment, automated learning methods can further separate into supervised and unsupervised machine learning. Patterns extraction with machine learning process annotated and unannotated text have been explored extensively by academic researchers.

每个类的单词或短语指示符的集合被定义用于在未注释的文本上定位理想的模式。对于主观表达，已经创建了一个不同的词表。单词或短语中的主观指标列表是由 Riloff 语言学家和自然语言处理领域的多名研究人员开发的。必须创建一个抽取规则字典来度量给定的表达式。多年来，在主观检测方面，从1999年的手工特征提取到2005年的自动特征学习。目前，自动化学习方法可以进一步分为监督学习和非监督式学习学习。利用机器学习过程对文本进行注释和去注释的模式提取方法已经成为学术界研究的热点。

−

However, researchers recognized several challenges in developing fixed sets of rules for expressions respectably. Much of the challenges in rule development stems from the nature of textual information. Six challenges have been recognized by several researchers: 1) metaphorical expressions, 2) discrepancies in writings, 3) context-sensitive, 4) represented words with fewer usages, 5) time-sensitive, and 6) ever-growing volume.

第331行：第249行：

# Discrepancies in writings. For the text obtained from the Internet, the discrepancies in the writing style of targeted text data involve distinct writing genres and styles

# Context-sensitive. Classification may vary based on the subjectiveness or objectiveness of previous and following sentences.<ref name=":1">{{Cite journal|last1=Pang|first1=Bo|last2=Lee|first2=Lillian|date=2008-07-06|title=Opinion Mining and Sentiment Analysis|url=https://www.nowpublishers.com/article/Details/INR-011|journal=Foundations and Trends in Information Retrieval|language=en|volume=2|issue=1–2|pages=1–135|doi=10.1561/1500000011|issn=1554-0669}}</ref>

−

# Time-sensitive attribute. The task is challenged by the some textual data’s time-sensitive attribute. If a group of researchers wants to confirm a piece of fact in the news, they need a longer time for cross-validation, than the news becomes outdated.

−

~~# Cue words with fewer usages.~~

−

# Ever-growing volume. The task is also challenged by the sheer volume of textual data. The textual data's ever-growing nature makes the task overwhelmingly difficult for the researchers to complete the task on time.

−

# Metaphorical expressions. The text contains metaphoric expression may impact on the performance on the extraction. Besides, metaphors take in different forms, which may have been contributed to the increase in detection.

−

~~# Discrepancies in writings. For the text obtained from the Internet, the discrepancies in the writing style of targeted text data involve distinct writing genres and styles~~

−

~~# Context-sensitive. Classification may vary based on the subjectiveness or objectiveness of previous and following sentences.~~

# Time-sensitive attribute. The task is challenged by the some textual data’s time-sensitive attribute. If a group of researchers wants to confirm a piece of fact in the news, they need a longer time for cross-validation, than the news becomes outdated.

# Cue words with fewer usages.

第345行：第256行：

Previously, the research mainly focused on document level classification. However, classifying a document level suffers less accuracy, as an article may have diverse types of expressions involved. Researching evidence suggests a set of news articles that are expected to dominate by the objective expression, whereas the results show that it consisted of over 40% of subjective expression.<ref name="Wiebe 2005 486–497"/>

−

Previously, the research mainly focused on document level classification. However, classifying a document level suffers less accuracy, as an article may have diverse types of expressions involved. Researching evidence suggests a set of news articles that are expected to dominate by the objective expression, whereas the results show that it consisted of over 40% of subjective expression.

以往的研究主要集中在文档级别的分类上。然而，文档级别的分类准确性较低，因为一篇文章可能涉及不同类型的表达方式。研究证据表明，一组新闻文章被期望以客观表达为主，而研究结果表明，这组新闻文章占主观表达的40% 以上。

To overcome those challenges, researchers conclude that classifier efficacy depends on the precisions of patterns learner. And the learner feeds with large volumes of annotated training data outperformed those trained on less comprehensive subjective features. However, one of the main obstacles to executing this type of work is to generate a big dataset of annotated sentences manually. The manual annotation method has been less favored than automatic learning for three reasons:

−

To overcome those challenges, researchers conclude that classifier efficacy depends on the precisions of patterns learner. And the learner feeds with large volumes of annotated training data outperformed those trained on less comprehensive subjective features. However, one of the main obstacles to executing this type of work is to generate a big dataset of annotated sentences manually. The manual annotation method has been less favored than automatic learning for three reasons:

为了克服这些挑战，研究人员得出结论，分类效能取决于模式学习者的精确度。而且，带有大量注释的训练数据的学习者饲料表现优于那些不太全面的主观特征的训练者。然而，执行此类工作的主要障碍之一是手动生成大量带注释的句子数据集。手动注释方法不如自动学习方法受欢迎，原因有三:

第360行：第267行：

# Time-consuming. Manual annotation task is an assiduious work. Riloff (1996) show that a 160 texts cost 8 hours for one annotator to finish.<ref>{{Cite journal|last=Riloff|first=Ellen|date=1996-08-01|title=An empirical study of automated dictionary construction for information extraction in three domains|url=https://dx.doi.org/10.1016%2F0004-3702%2895%2900123-9|journal=Artificial Intelligence|language=en|volume=85|issue=1|pages=101–134|doi=10.1016/0004-3702(95)00123-9|issn=0004-3702|doi-access=free}}</ref>

−

~~# Variations in comprehensions. In the manual annotation task, disagreement of whether one instance is subjective or objective may occur among annotators because of languages' ambiguity.~~

+

# 理解上的变化。在手工注释过程中，由于语言的模糊性，注释者之间可能会出现主观或客观实例的分歧。# 人为错误。手工注释是一项细致的工作，需要高度集中精力才能完成。# 费时。手工注释是一项繁重的工作。里洛夫(1996)表明，一个注释者完成160篇文本需要8个小时.

−

~~# Human errors. Manual annotation task is a meticulous assignment, it require intense concentration to finish.~~

−

~~# Time-consuming. Manual annotation task is an assiduious work. Riloff (1996) show that a 160 texts cost 8 hours for one annotator to finish.~~

−

# 理解上的变化。在手工注释过程中，由于语言的模糊性，注释者之间可能会出现主观或客观实例的分歧。# 人为错误。手工注释是一项细致的工作，需要高度集中精力才能完成。# 费时。手工注释是一项繁重的工作。里洛夫(1996)~~表明，一个注释者完成160篇文本需要8个小时。~~

−

All these mentioned reasons can impact on the efficiency and effectiveness of subjective and objective classification. Accordingly, two bootstrapping methods were designed to learning linguistic patterns from unannotated text data. Both methods are starting with a handful of seed words and unannotated textual data.

−

All these mentioned reasons can impact on the efficiency and effectiveness of subjective and objective classification. Accordingly, two bootstrapping methods were designed to learning linguistic patterns from unannotated text data. Both methods are starting with a handful of seed words and unannotated textual data.

第374行：第274行：

# Meta-Bootstrapping by Riloff and Jones in 1999.<ref>{{Cite journal|last1=Riloff|first1=Ellen|last2=Jones|first2=Rosie|date=July 1999|title=Learning dictionaries for information extraction by multi-level bootstrapping|url=https://aaai.org/Papers/AAAI/1999/AAAI99-068.pdf|journal=AAAI '99/IAAI '99: Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence|pages=474–479}}</ref> Level One: Generate extraction patterns based on the pre-defined rules and the extracted patterns by the number of seed words each pattern holds. Leve Two: Top 5 words will be marked and add to the dictionary. Repeat.

# Basilisk (Bootstrapping Approach to SemantIc Lexicon Induction using Semantic Knowledge) by Thelen and Riloff.<ref>{{Cite journal|last1=Thelen|first1=Michael|last2=Riloff|first2=Ellen|date=2002-07-06|title=A bootstrapping method for learning semantic lexicons using extraction pattern contexts|journal=Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10|series=EMNLP '02|volume=10|location=USA|publisher=Association for Computational Linguistics|pages=214–221|doi=10.3115/1118693.1118721|s2cid=137155|doi-access=free}}</ref> Step One: Generate extration patterns Step Two: Move best patterns from Pattern Pool to Candidate Word Pool. Step Three: Top 10 words will be marked and add to the dictionary. Repeat.

−

# Meta-Bootstrapping by Riloff and Jones in 1999. Level One: Generate extraction patterns based on the pre-defined rules and the extracted patterns by the number of seed words each pattern holds. Leve Two: Top 5 words will be marked and add to the dictionary. Repeat.

−

# Basilisk (Bootstrapping Approach to SemantIc Lexicon Induction using Semantic Knowledge) by Thelen and Riloff. Step One: Generate extration patterns Step Two: Move best patterns from Pattern Pool to Candidate Word Pool. Step Three: Top 10 words will be marked and add to the dictionary. Repeat.

# 1999年里洛夫和琼斯的 Meta-Bootstrapping。第一级: 根据预定义的规则生成提取模式，并根据每个模式所包含的种子词数量生成提取模式。第二步: 前5个单词将被标记并添加到字典中。重复。# Basilisk (Bootstrapping Approach to SemantIc Lexicon inducing using SemantIc Knowledge) Thelen and Riloff.第一步: 生成抽取模式第二步: 将最好的模式从模式池移动到候选单词池。第三步: 将前10个单词标记并添加到字典中。重复。

−

~~Overall, these algorithms highlight the need for automatic pattern recognition and extraction in subjective and objective task.~~

Overall, these algorithms highlight the need for automatic pattern recognition and extraction in subjective and objective task.

第387行：第283行：

Subjective and object classifier can enhance the serval applications of natural language processing. One of the classifier's primary benefits is that it popularized the practice of data-driven decision-making processes in various industries.  According to Liu, the applications of subjective and objective identification have been implemented in business, advertising, sports, and social science.<ref>{{Cite journal|last=Liu|first=Bing|date=2012-05-23|title=Sentiment Analysis and Opinion Mining|url=https://www.morganclaypool.com/doi/abs/10.2200/S00416ED1V01Y201204HLT016|journal=Synthesis Lectures on Human Language Technologies|volume=5|issue=1|pages=1–167|doi=10.2200/S00416ED1V01Y201204HLT016|issn=1947-4040}}</ref>

−

Subjective and object classifier can enhance the serval applications of natural language processing. One of the classifier's primary benefits is that it popularized the practice of data-driven decision-making processes in various industries. According to Liu, the applications of subjective and objective identification have been implemented in business, advertising, sports, and social science.

主观分类器和对象分类器可以增强自然语言处理的一些应用。分类器的主要好处之一是它使数据驱动决策过程的实践在各个行业中普及。据刘说，主观和客观识别的应用已经在商业、广告、体育和社会科学中得到了实施。

第398行：第292行：

*Document summarising: The classifier can extract target-specified comments and gathering opinions made by one particular entity.

* Complex question answering. The classifier can dissect the complex questions by classing the language subject or objective and focused target. In the research Yu et al.(2003), the researcher developed a sentence and document level clustered that identity opinion pieces.<ref>{{Cite journal|last1=Yu|first1=Hong|last2=Hatzivassiloglou|first2=Vasileios|date=2003-07-11|title=Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences|journal=Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing|series=EMNLP '03|location=USA|publisher=Association for Computational Linguistics|pages=129–136|doi=10.3115/1119355.1119372|doi-access=free}}</ref>

−

* Domain-specific applications.

−

* Email analysis: The subjective and objective classifier detects spam by tracing language patterns with target words.

−

* Online review classification: In the business industry, the classifier helps the company better understand the feedbacks on product and reasonings behind the reviews.

−

* Stock price prediction: In the finance industry, the classier aids the prediction model by process auxiliary information from social media and other textual information from the Internet. Previous studies on Japanese stock price conducted by Dong et.al. indicates that model with subjective and objective module may perform better than those without this part.

−

* Social media analysis.

−

* Students' feedback classification.

−

*Document summarising: The classifier can extract target-specified comments and gathering opinions made by one particular entity.

−

* Complex question answering. The classifier can dissect the complex questions by classing the language subject or objective and focused target. In the research Yu et al.(2003), the researcher developed a sentence and document level clustered that identity opinion pieces.

* Domain-specific applications.

* Email analysis: The subjective and objective classifier detects spam by tracing language patterns with target words.

第461行：第346行：

}}

</ref>

−

~~It refers to determining the opinions or sentiments expressed on different features or aspects of entities, e.g., of a cell phone, a digital camera, or a bank.~~

−

A feature or aspect is an attribute or component of an entity, e.g., the screen of a cell phone, the service for a restaurant, or the picture quality of a camera. The advantage of feature-based sentiment analysis is the possibility to capture nuances about objects of interest. Different features can generate different sentiment responses, for example a hotel can have a convenient location, but mediocre food. This problem involves several sub-problems, e.g., identifying relevant entities, extracting their features/aspects, and determining whether an opinion expressed on each feature/aspect is positive, negative or neutral.

−

~~The automatic identification of features can be performed with syntactic methods, with topic modeling, or with deep learning.~~

−

~~More detailed discussions about this level of sentiment analysis can be found in Liu's work.~~

它指的是确定对实体的不同特征或方面表达的意见或感情，例如，手机、数码相机或银行。功能或方面是一个实体的属性或组成部分，例如，手机的屏幕，餐厅的服务，或照相机的图像质量。基于特征的情感分析的优势在于可以捕捉感兴趣对象的细微差别。不同的特征可以产生不同的情绪反应，例如，酒店可以有一个方便的地点，但平庸的食物。这个问题涉及几个子问题，例如，识别相关实体，提取它们的特征/方面，以及确定对每个特征/方面表达的意见是积极的、消极的还是中性的。特征的自动识别可以通过句法方法、主题建模或者深度学习来实现。关于这一层次的情感分析的更详细的讨论可以在刘的作品中找到。

−

== Methods and ~~features==~~

+

= Methods and features方法和特征 =

−

~~== Methods and features==~~

−

~~= = 方法和特征 =~~ =

Existing approaches to sentiment analysis can be grouped into three main categories: knowledge-based techniques, statistical methods, and hybrid approaches.<ref name ="“Cambria">

第564行：第439行：

| url = http://portal.acm.org/citation.cfm?id=1390763&dl=GUIDE&coll=GUIDE&CFID=92244761&CFTOKEN=30578437

}}

−

</ref> Hybrid approaches leverage both machine learning and elements from [[knowledge representation]] such as [[ontologies]] and [[semantic network]]s in order to detect semantics that are expressed in a subtle manner, e.g., through the analysis of concepts that do not explicitly convey relevant information, but which are implicitly linked to other concepts that do so.<ref name ="“Hussain">

+

</ref> Hybrid approaches leverage both machine learning and elements from [[knowledge representation]] such as [[ontologies]] and [[semantic network]]s in order to detect semantics that are expressed in a subtle manner, e.g., through the analysis of concepts that do not explicitly convey relevant information, but which are implicitly linked to other concepts that do so.<ref name="“Hussain">

{{cite book

| first1 = E

第578行：第453行：

</ref>

−

~~Existing approaches to sentiment analysis can be grouped into three main categories: knowledge-based techniques, statistical methods, and hybrid approaches.~~

−

~~Knowledge-based techniques classify text by affect categories based on the presence of unambiguous affect words such as happy, sad, afraid, and bored.~~

−

~~Some knowledge bases not only list obvious affect words, but also assign arbitrary words a probable "affinity" to particular emotions.~~

−

~~Statistical methods leverage elements from machine learning such as latent semantic analysis, support vector machines, "bag of words", "Pointwise Mutual Information" for Semantic Orientation,~~

−

and deep learning. More sophisticated methods try to detect the holder of a sentiment (i.e., the person who maintains that affective state) and the target (i.e., the entity about which the affect is felt).

−

To mine the opinion in context and get the feature about which the speaker has opined, the grammatical relationships of words are used. Grammatical dependency relations are obtained by deep parsing of the text.

−

Hybrid approaches leverage both machine learning and elements from knowledge representation such as ontologies and semantic networks in order to detect semantics that are expressed in a subtle manner, e.g., through the analysis of concepts that do not explicitly convey relevant information, but which are implicitly linked to other concepts that do so.

第616行：第478行：

</ref> utilizing an adjective noun pair representation of visual content. In addition, the vast majority of sentiment classification approaches rely on the bag-of-words model, which disregards context, [[grammar]] and even [[word order]]. Approaches that analyses the sentiment based on how words compose the meaning of longer phrases have shown better result,<ref>{{Cite journal|last1=Socher|first1=Richard|last2=Perelygin|first2=Alex|last3=Wu|first3=Jean Y.|last4=Chuang|first4=Jason|last5=Manning|first5=Christopher D.|last6=Ng|first6=Andrew Y.|last7=Potts|first7=Christopher|date=2013|title=Recursive deep models for semantic compositionality over a sentiment treebank|journal=In Proceedings of EMNLP|pages=1631–1642|citeseerx=10.1.1.593.7427}}</ref> but they incur an additional annotation overhead.

−

Open source software tools as well as range of free and paid sentiment analysis tools deploy machine learning, statistics, and natural language processing techniques to automate sentiment analysis on large collections of texts, including web pages, online news, internet discussion groups, online reviews, web blogs, and social media.

−

Knowledge-based systems, on the other hand, make use of publicly available resources, to extract the semantic and affective information associated with natural language concepts. The system can help perform affective commonsense reasoning. Sentiment analysis can also be performed on visual content, i.e., images and videos (see Multimodal sentiment analysis). One of the first approaches in this direction is SentiBank

−

utilizing an adjective noun pair representation of visual content. In addition, the vast majority of sentiment classification approaches rely on the bag-of-words model, which disregards context, grammar and even word order. Approaches that analyses the sentiment based on how words compose the meaning of longer phrases have shown better result, but they incur an additional annotation overhead.

开源软件工具以及一系列免费和付费的情绪分析工具利用机器学习、统计学和自然语言处理技术，对大量文本自动进行情绪分析，这些文本包括网页、在线新闻、互联网讨论组、在线评论、网络博客和社交媒体。另一方面，知识推理系统则利用公开的资源，提取与自然语言概念相关的语义和情感信息。该系统可以帮助执行情感常识推理。情感分析也可以在可视内容上执行，例如，图像和视频(请参阅 Multimodal 情感分析)。这方面的第一个方法是使用形容词名词对表示视觉内容。此外，绝大多数情感分类方法都依赖于情感分类词袋模型，它忽略了上下文、语法甚至词序。基于词语组成长短语意义的情感分析方法取得了较好的效果，但也增加了额外的注释开销。

A human analysis component is required in sentiment analysis, as automated systems are not able to analyze historical tendencies of the individual commenter, or the platform and are often classified incorrectly in their expressed sentiment. Automation impacts approximately 23% of comments that are correctly classified by humans.<ref>{{cite web|title=Case Study: Advanced Sentiment Analysis|url=http://paragonpoll.com/sentiment-analysis-systems-case-study/|access-date=18 October 2013}}</ref> However, humans often disagree, and it is argued that the inter-human agreement provides an upper bound that automated sentiment classifiers can eventually reach.<ref>{{Cite journal|last1=Mozetič|first1=Igor|last2=Grčar|first2=Miha|last3=Smailović|first3=Jasmina|date=2016-05-05|title=Multilingual Twitter Sentiment Classification: The Role of Human Annotators|journal=PLOS ONE|volume=11|issue=5|pages=e0155036|doi=10.1371/journal.pone.0155036|issn=1932-6203|pmc=4858191|pmid=27149621|arxiv=1602.07563|bibcode=2016PLoSO..1155036M}}</ref>

−

A human analysis component is required in sentiment analysis, as automated systems are not able to analyze historical tendencies of the individual commenter, or the platform and are often classified incorrectly in their expressed sentiment. Automation impacts approximately 23% of comments that are correctly classified by humans. However, humans often disagree, and it is argued that the inter-human agreement provides an upper bound that automated sentiment classifiers can eventually reach.

在情感分析中需要一个人工分析组件，因为自动化系统不能分析个人评论者或平台的历史趋势，而且在他们表达的情感中常常被错误地分类。自动化影响了大约23% 被人类正确分类的评论。然而，人们往往不同意，并认为人际协议提供了一个上限，自动情绪分类器最终可以达到。

−

== Evaluation ==

+

= Evaluation 评估 =

−

~~== Evaluation ==~~

−

~~= =~~ 评估 = =

The accuracy of a sentiment analysis system is, in principle, how well it agrees with human judgments. This is usually measured by variant measures based on [[precision and recall]] over the two target categories of negative and positive texts. However, according to research human raters typically only agree about 80%<ref>

第651行：第503行：

</ref>

−

The accuracy of a sentiment analysis system is, in principle, how well it agrees with human judgments. This is usually measured by variant measures based on precision and recall over the two target categories of negative and positive texts. However, according to research human raters typically only agree about 80%

−

of the time (see Inter-rater reliability). Thus, a program that achieves 70% accuracy in classifying sentiment is doing nearly as well as humans, even though such accuracy may not sound impressive. If a program were "right" 100% of the time, humans would still disagree with it about 20% of the time, since they disagree that much about any answer.

第666行：第515行：

[[Jussi Karlgren|Karlgren, Jussi]]. "[http://www.diva-portal.org/smash/get/diva2:1042636/FULLTEXT01.pdf Affect, appeal, and sentiment as factors influencing interaction with multimedia information]." In Proceedings of Theseus/ImageCLEF workshop on visual information retrieval evaluation, pp. 8-11. 2009.

</ref>

−

On the other hand, computer systems will make very different errors than human assessors, and thus the figures are not entirely comparable. For instance, a computer system will have trouble with negations, exaggerations, jokes, or sarcasm, which typically are easy to handle for a human reader: some errors a computer system makes will seem overly naive to a human. In general, the utility for practical commercial tasks of sentiment analysis as it is defined in academic research has been called into question, mostly since the simple one-dimensional model of sentiment from negative to positive yields rather little actionable information for a client worrying about the effect of public discourse on e.g. brand or corporate reputation.

−

Karlgren, Jussi, Magnus Sahlgren, Fredrik Olsson, Fredrik Espinoza, and Ola Hamfors. "Usefulness of sentiment analysis." In European Conference on Information Retrieval, pp. 426-435. Springer Berlin Heidelberg, 2012.

−

Karlgren, Jussi. "The relation between author mood and affect to sentiment in text and text genre." In Proceedings of the fourth workshop on Exploiting semantic annotations in information retrieval, pp. 9-10. ACM, 2011.

−

Karlgren, Jussi. "Affect, appeal, and sentiment as factors influencing interaction with multimedia information." In Proceedings of Theseus/ImageCLEF workshop on visual information retrieval evaluation, pp. 8-11. 2009.

−

另一方面，计算机系统会犯与人类评估员非常不同的错误，因此这些数字并不完全可比。例如，计算机系统在否定、夸张、笑话或讽刺方面会遇到麻烦，而这些对于人类读者来说通常是很容易处理的: 计算机系统出现的一些错误对于人类来说会显得过于天真。一般来说，学术研究中定义的情绪分析对实际商业任务的效用受到质疑，主要是因为简单的从消极到积极的情绪单维度模型产生的可操作信息很少，客户担心公共话语对情绪分析的影响。品牌或企业声誉。Karlgren, Jussi, Magnus Sahlgren, Fredrik Olsson, Fredrik Espinoza, and Ola Hamfors.“情绪分析的有用性。”在欧洲信息检索会议上，pp。426-435.Springer Berlin Heidelberg，2012年。尤西 · 卡尔格伦。作者情绪与文本和文本体裁中情感的关系在《第四次研讨会论文集---- 开发信息检索语义标注》中，pp。9-10.美国计算机协会，2011。尤西 · 卡尔格伦。影响与多媒体信息互动的因素包括情感、吸引力和情感在 Theseus/ImageCLEF 视觉信息检索评估研讨会论文集中，第页。8-11.2009.

第684行：第525行：

Amigó, Enrique, Jorge Carrillo-de-Albornoz, Irina Chugur, Adolfo Corujo, Julio Gonzalo, Edgar Meij, [[Maarten de Rijke]], and Damiano Spina. "Overview of replab 2014: author profiling and reputation dimensions for online reputation management." In International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 307-322. Springer International Publishing, 2014.

</ref>

−

To better fit market needs, evaluation of sentiment analysis has moved to more task-based measures, formulated together with representatives from PR agencies and market research professionals. The focus in e.g. the RepLab evaluation data set is less on the content of the text under consideration and more on the effect of the text in question on brand reputation.

−

Amigó, Enrique, Adolfo Corujo, Julio Gonzalo, Edgar Meij, and Maarten de Rijke. "Overview of RepLab 2012: Evaluating Online Reputation Management Systems." In CLEF (Online Working Notes/Labs/Workshop). 2012.

−

Amigó, Enrique, Jorge Carrillo De Albornoz, Irina Chugur, Adolfo Corujo, Julio Gonzalo, Tamara Martín, Edgar Meij, Maarten de Rijke, and Damiano Spina. "Overview of replab 2013: Evaluating online reputation monitoring systems." In International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 333-352. Springer Berlin Heidelberg, 2013.

−

Amigó, Enrique, Jorge Carrillo-de-Albornoz, Irina Chugur, Adolfo Corujo, Julio Gonzalo, Edgar Meij, Maarten de Rijke, and Damiano Spina. "Overview of replab 2014: author profiling and reputation dimensions for online reputation management." In International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 307-322. Springer International Publishing, 2014.

第704行：第538行：

The rise of [[social media]] such as [[blogs]] and [[social network]]s has fueled interest in sentiment analysis. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations. As businesses look to automate the process of filtering out the noise, understanding the conversations, identifying the relevant content and actioning it appropriately, many are now looking to the field of sentiment analysis.<ref name="Mining the Web for Feelings, Not Facts">Wright, Alex. [https://www.nytimes.com/2009/08/24/technology/internet/24emotion.html?_r=1 "Mining the Web for Feelings, Not Facts"], ''[[New York Times]]'', 2009-08-23. Retrieved on 2009-10-01.</ref> Further complicating the matter, is the rise of anonymous social media platforms such as [[4chan]] and [[Reddit]].<ref>{{cite web|title=Sentiment Analysis on Reddit|url=http://news.humanele.com/sentiment-analysis-reddit/|access-date=10 October 2014|date=2014-09-30}}</ref> If [[web 2.0]] was all about democratizing publishing, then the next stage of the web may well be based on democratizing [[data mining]] of all the content that is getting published.<ref name="The Future of Social Media Monitoring">Kirkpatrick, Marshall. [https://readwrite.com/2009/04/15/whats_next_in_social_media_monitoring/ "], ''[[ReadWriteWeb]]'', 2009-04-15. Retrieved on 2009-10-01.</ref>

−

The rise of social media such as blogs and social networks has fueled interest in sentiment analysis. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations. As businesses look to automate the process of filtering out the noise, understanding the conversations, identifying the relevant content and actioning it appropriately, many are now looking to the field of sentiment analysis.Wright, Alex. "Mining the Web for Feelings, Not Facts", New York Times, 2009-08-23. Retrieved on 2009-10-01. Further complicating the matter, is the rise of anonymous social media platforms such as 4chan and Reddit. If web 2.0 was all about democratizing publishing, then the next stage of the web may well be based on democratizing data mining of all the content that is getting published.Kirkpatrick, Marshall. ", ReadWriteWeb, 2009-04-15. Retrieved on 2009-10-01.

博客和社交网络等社交媒体的兴起激发了人们对情绪分析的兴趣。随着评论、评级、推荐和其他形式的网络表达的激增，网络舆论已经变成了一种虚拟货币，企业可以通过这种货币来推销自己的产品、寻找新的机会和管理自己的声誉。随着企业寻求自动化过滤噪音的过程，理解对话，识别相关内容并适当活动，许多企业现在正在寻找情绪分析领域。莱特，亚历克斯。“从网上挖掘情感，而不是事实”，纽约时报，2009-08-23。2009-10-01.更复杂的是，匿名社交媒体平台的兴起，如4chan 和 Reddit。如果说 web 2.0完全是关于民主化发布，那么 web 的下一个阶段很可能是基于对所有正在发布的内容的民主化数据挖掘。马歇尔 · 柯克帕特里克。”，ReadWriteWeb，2009-04-15。2009-10-01.

One step towards this aim is accomplished in research. Several research teams in universities around the world currently focus on understanding the dynamics of sentiment in [[Virtual community|e-communities]] through sentiment analysis.<ref name="Collective emotions in cyberspace">CORDIS. [http://cordis.europa.eu/fetch?CALLER=FP7_PROJ_EN&ACTION=D&DOC=1&CAT=PROJ&QUERY=011e4ea33ef2:358b:41dc0328&RCN=89032 "Collective emotions in cyberspace (CYBEREMOTIONS)"], ''[[European Commission]]'', 2009-02-03. Retrieved on 2010-12-13.</ref> The [[CyberEmotions|CyberEmotions project]], for instance, recently identified the role of negative [[emotion]]s in driving social networks discussions.<ref name="NewSci_flaming">Condliffe, Jamie. [https://www.newscientist.com/article/dn19821-flaming-drives-online-social-networks.html "Flaming drives online social networks "], ''[[New Scientist]]'', 2010-12-07. Retrieved on 2010-12-13.</ref>

−

One step towards this aim is accomplished in research. Several research teams in universities around the world currently focus on understanding the dynamics of sentiment in e-communities through sentiment analysis.CORDIS. "Collective emotions in cyberspace (CYBEREMOTIONS)", European Commission, 2009-02-03. Retrieved on 2010-12-13. The CyberEmotions project, for instance, recently identified the role of negative emotions in driving social networks discussions.Condliffe, Jamie. "Flaming drives online social networks ", New Scientist, 2010-12-07. Retrieved on 2010-12-13.

实现这一目标的一个步骤就是研究。目前，世界各地的一些大学的研究团队通过情绪分析专注于了解电子社区中情绪的动态。「网络空间的集体情绪」，欧洲委员会，2009-02-03。2010-12-13.例如，CyberEmotions 项目最近发现了负面情绪在推动社交网络讨论中的作用。“火焰驱动在线社交网络”，《新科学家》，2010-12-07。2010-12-13.

−

The problem is that most sentiment analysis algorithms use simple terms to express sentiment about a product or service. However, cultural factors, linguistic nuances, and differing contexts make it extremely difficult to turn a string of written text into a simple pro or con sentiment.<ref name="Mining the Web for Feelings, Not Facts"/> The fact that humans often disagree on the sentiment of text illustrates how big a task it is for computers to get this right. The shorter the string of text, the harder it becomes.

−

The problem is that most sentiment analysis algorithms use simple terms to express sentiment about a product or service. However, cultural factors, linguistic nuances, and differing contexts make it extremely difficult to turn a string of written text into a simple pro or con sentiment. The fact that humans often disagree on the sentiment of text illustrates how big a task it is for computers to get this right. The shorter the string of text, the harder it becomes.

+

The problem is that most sentiment analysis algorithms use simple terms to express sentiment about a product or service. However, cultural factors, linguistic nuances, and differing contexts make it extremely difficult to turn a string of written text into a simple pro or con sentiment.<ref name="Mining the Web for Feelings, Not Facts" /> The fact that humans often disagree on the sentiment of text illustrates how big a task it is for computers to get this right. The shorter the string of text, the harder it becomes.

问题是，大多数情绪分析算法使用简单的术语来表达对产品或服务的情绪。然而，文化因素、语言上的细微差别以及不同的语境使得将一串文字转换成简单的赞成或反对的情绪变得极其困难。事实上，人们经常不同意文本的情绪，这说明了计算机要做好这件事是多么艰巨的任务。字符串越短，就越难。

Even though short text strings might be a problem, sentiment analysis within [[microblogging]] has shown that [[Twitter]] can be seen as a valid online indicator of political sentiment. Tweets' political sentiment demonstrates close correspondence to parties' and politicians' political positions, indicating that the content of Twitter messages plausibly reflects the offline political landscape.<ref>Tumasjan, Andranik; O.Sprenger, Timm; G.Sandner, Philipp; M.Welpe, Isabell (2010). [http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/viewFile/1441/1852 "Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment"]. "Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media"</ref> Furthermore, sentiment analysis on [[Twitter]] has also been shown to capture the public mood behind human reproduction cycles on a planetary scale{{peacock term|date=June 2018}},<ref name="r25">{{cite journal|doi=10.1038/s41598-017-18262-5|pmid=29269945|pmc=5740080|title=Human Sexual Cycles are Driven by Culture and Match Collective Moods|journal=Scientific Reports|volume=7|issue=1|pages=17973|year=2017|last1=Wood|first1=Ian B.|last2=Varela|first2=Pedro L.|last3=Bollen|first3=Johan|last4=Rocha|first4=Luis M.|last5=Gonçalves-Sá|first5=Joana|bibcode=2017NatSR...717973W|arxiv=1707.03959}}</ref> as well as other problems of public-health relevance such as adverse drug reactions.<ref name="r27">{{cite journal|doi=10.1016/j.jbi.2016.06.007|pmid=27363901|pmc=4981644|title=Analysis of the effect of sentiment analysis on extracting adverse drug reactions from tweets and forum posts|journal=Journal of Biomedical Informatics|volume=62|pages=148–158|year=2016|last1=Korkontzelos|first1=Ioannis|last2=Nikfarjam|first2=Azadeh|last3=Shardlow|first3=Matthew|last4=Sarker|first4=Abeed|last5=Ananiadou|first5=Sophia|last6=Gonzalez|first6=Graciela H.}}</ref>

−

Even though short text strings might be a problem, sentiment analysis within microblogging has shown that Twitter can be seen as a valid online indicator of political sentiment. Tweets' political sentiment demonstrates close correspondence to parties' and politicians' political positions, indicating that the content of Twitter messages plausibly reflects the offline political landscape.Tumasjan, Andranik; O.Sprenger, Timm; G.Sandner, Philipp; M.Welpe, Isabell (2010). "Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment". "Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media" Furthermore, sentiment analysis on Twitter has also been shown to capture the public mood behind human reproduction cycles on a planetary scale, as well as other problems of public-health relevance such as adverse drug reactions.

尽管短文字符串可能是个问题，微博内的情绪分析已经表明，Twitter 可以被视为一个有效的政治情绪在线指标。推特的政治情绪表明，它与政党和政客的政治立场非常吻合，这表明推特信息的内容合理地反映了线下的政治格局。安德拉尼克; O.Sprenger，Timm; G.Sandner，Philipp; M.Welpe，Isabell (2010)。“用 Twitter 预测选举: 140个人物揭示的政治情绪”。此外，推特上的情绪分析还显示，在全球范围内，人类生殖周期背后的公众情绪，以及其他与公共健康相关的问题，如药物不良反应。

第731行：第557行：

For a [[recommender system]], sentiment analysis has been proven to be a valuable technique. A [[recommender system]] aims to predict the preference for an item of a target user. Mainstream recommender systems work on explicit data set. For example, [[collaborative filtering]] works on the rating matrix, and [[content-based filtering]] works on the [[Metadata|meta-data]] of the items.

−

For a recommender system, sentiment analysis has been proven to be a valuable technique. A recommender system aims to predict the preference for an item of a target user. Mainstream recommender systems work on explicit data set. For example, collaborative filtering works on the rating matrix, and content-based filtering works on the meta-data of the items.

一个推荐系统以来，情绪分析已经被证明是一种有价值的技术。推荐系统的目的是预测目标用户对某个商品的偏好。主流推荐系统工作在显式数据集上。例如，协同过滤工作在评级矩阵上，基于内容的过滤工作在项目的元数据上。

In many [[social networking service]]s or [[e-commerce]] websites, users can provide text review, comment or feedback to the items. These user-generated text provide a rich source of user's sentiment opinions about numerous products and items. Potentially, for an item, such text can reveal both the related feature/aspects of the item and the users' sentiments on each feature.<ref>{{cite journal|url=https://pdfs.semanticscholar.org/8f1b/9b97183b8aa2caa0fb6c9563b14daabe8316.pdf|archive-url=https://web.archive.org/web/20180524004208/https://pdfs.semanticscholar.org/8f1b/9b97183b8aa2caa0fb6c9563b14daabe8316.pdf|url-status=dead|archive-date=2018-05-24|first1=Huifeng|last1=Tang|first2=Songbo|last2=Tan|first3=Xueqi|last3=Cheng|title=A survey on sentiment detection of reviews|journal=Expert Systems with Applications|volume=36|issue=7|year=2009|pages=10760–10773|doi=10.1016/j.eswa.2009.02.063|s2cid=2178380}}</ref> The item's feature/aspects described in the text play the same role with the meta-data in [[content-based filtering]], but the former are more valuable for the recommender system. Since these features are broadly mentioned by users in their reviews, they can be seen as the most crucial features that can significantly influence the user's experience on the item, while the meta-data of the item (usually provided by the producers instead of consumers) may ignore features that are concerned by the users. For different items with common features, a user may give different sentiments. Also, a feature of the same item may receive different sentiments from different users. Users' sentiments on the features can be regarded as a multi-dimensional rating score, reflecting their preference on the items.

−

In many social networking services or e-commerce websites, users can provide text review, comment or feedback to the items. These user-generated text provide a rich source of user's sentiment opinions about numerous products and items. Potentially, for an item, such text can reveal both the related feature/aspects of the item and the users' sentiments on each feature. The item's feature/aspects described in the text play the same role with the meta-data in content-based filtering, but the former are more valuable for the recommender system. Since these features are broadly mentioned by users in their reviews, they can be seen as the most crucial features that can significantly influence the user's experience on the item, while the meta-data of the item (usually provided by the producers instead of consumers) may ignore features that are concerned by the users. For different items with common features, a user may give different sentiments. Also, a feature of the same item may receive different sentiments from different users. Users' sentiments on the features can be regarded as a multi-dimensional rating score, reflecting their preference on the items.

在许多社交网络服务或电子商务网站，用户可以提供文本审查，评论或反馈的项目。这些用户生成的文本提供了丰富的来源，用户对许多产品和项目的情感意见。对于一个项目，这样的文本可以显示项目的相关特性/方面以及用户对每个特性的看法。在基于内容的过滤中，文本中描述的条目的特征/方面与元数据起着同样的作用，但前者对推荐系统更有价值。由于用户在评论中广泛提到这些功能，它们可以被视为能够显著影响用户对产品的体验的最关键的功能，而产品的元数据(通常由生产者而不是消费者提供)可能忽略用户关心的功能。对于具有共同特征的不同项目，用户可能会给出不同的感受。而且，同一个项目的某个特性可能会收到不同用户的不同意见。用户对特征的感受可以看作是一个多维度的评分分值，反映了用户对特征的偏好。

Based on the feature/aspects and the sentiments extracted from the user-generated text, a hybrid recommender system can be constructed.<ref name=":0">Jakob, Niklas, et al. "Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations." ''Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion''. ACM, 2009.</ref> There are two types of motivation to recommend a candidate item to a user. The first motivation is the candidate item have numerous common features with the user's preferred items,<ref>{{cite journal|first1=Hu|last1=Minqing|first2=Bing|last2=Liu|title=Mining opinion features in customer reviews|journal=AAAI|volume=4|issue=4|year=2004|s2cid=5724860|url=https://pdfs.semanticscholar.org/ee6c/726b55c66d4c222556cfae62a4eb69aa86b7.pdf|archive-url=https://web.archive.org/web/20180524004041/https://pdfs.semanticscholar.org/ee6c/726b55c66d4c222556cfae62a4eb69aa86b7.pdf|url-status=dead|archive-date=2018-05-24}}</ref> while the second motivation is that the candidate item receives a high sentiment on its features. For a preferred item, it is reasonable to believe that items with the same features will have a similar function or utility. So, these items will also likely to be preferred by the user. On the other hand, for a shared feature of two candidate items, other users may give positive sentiment to one of them while giving negative sentiment to another. Clearly, the high evaluated item should be recommended to the user. Based on these two motivations, a combination ranking score of similarity and sentiment rating can be constructed for each candidate item.<ref name=":0" />

−

Based on the feature/aspects and the sentiments extracted from the user-generated text, a hybrid recommender system can be constructed.Jakob, Niklas, et al. "Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations." Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion. ACM, 2009. There are two types of motivation to recommend a candidate item to a user. The first motivation is the candidate item have numerous common features with the user's preferred items, while the second motivation is that the candidate item receives a high sentiment on its features. For a preferred item, it is reasonable to believe that items with the same features will have a similar function or utility. So, these items will also likely to be preferred by the user. On the other hand, for a shared feature of two candidate items, other users may give positive sentiment to one of them while giving negative sentiment to another. Clearly, the high evaluated item should be recommended to the user. Based on these two motivations, a combination ranking score of similarity and sentiment rating can be constructed for each candidate item.

基于特征/方面和从用户生成的文本中提取的情感，可以构造一个混合推荐系统。雅各布，尼克拉斯，等等。“超越明星: 利用免费文本用户评论来提高电影推荐的准确性。”第一届国际信息和通信技术会议论文集——民意情绪分析。美国计算机协会，2009。向用户推荐候选商品有两种动机。第一个动机是候选项目与用户偏好项目具有许多共同特征，第二个动机是候选项目对其特征的高度评价。对于一个首选项目，有理由相信具有相同特性的项目将具有类似的功能或实用性。因此，这些项目也可能是首选的用户。另一方面，对于两个候选项目的共同特征，其他用户可能给予其中一个正面的情绪，而给予另一个负面的情绪。显然，应该向用户推荐评价较高的项目。基于这两个动机，可以为每个候选项目建立相似度和情感评分的组合排序评分。

Except for the difficulty of the sentiment analysis itself, applying sentiment analysis on reviews or feedback also faces the challenge of spam and biased reviews. One direction of work is focused on evaluating the helpfulness of each review.<ref>{{cite book|first1=Yang|last1=Liu|first2=Xiangji|last2=Huang|first3=Aijun|last3=An|first4=Xiaohui|last4=Yu|chapter-url=http://www.yorku.ca/xhyu/papers/ICDM2008.pdf|chapter=Modeling and predicting the helpfulness of online reviews|year=2008|title=ICDM'08. Eighth IEEE international conference on Data mining|pages=443–452|publisher= IEEE|doi=10.1109/ICDM.2008.94|isbn=978-0-7695-3502-9|s2cid=18235238}}</ref> Review or feedback poorly written is hardly helpful for recommender system. Besides, a review can be designed to hinder sales of a target product, thus be harmful to the recommender system even it is well written.

−

Except for the difficulty of the sentiment analysis itself, applying sentiment analysis on reviews or feedback also faces the challenge of spam and biased reviews. One direction of work is focused on evaluating the helpfulness of each review. Review or feedback poorly written is hardly helpful for recommender system. Besides, a review can be designed to hinder sales of a target product, thus be harmful to the recommender system even it is well written.

除了情感分析本身的困难之外，对评论或反馈进行情感分析也面临着垃圾评论和有偏见的评论的挑战。其中一个工作方向是评估每个审查的有用性。写得不好的评论或反馈对推荐系统几乎没有任何帮助。此外，审查可能被设计成阻碍目标产品的销售，因此即使它写得很好也会对推荐系统产品造成伤害。

Researchers also found that long and short forms of user-generated text should be treated differently. An interesting result shows that short-form reviews are sometimes more helpful than long-form,<ref>{{cite book|doi=10.1145/1871437.1871741|last1=Bermingham|first1=Adam|last2=Smeaton|first2=Alan F.|title=Classifying sentiment in microblogs: is brevity an advantage?|journal=Proceedings of the 19th ACM International Conference on Information and Knowledge Management|pages=1833|year=2010|isbn=9781450300995|s2cid=2084603|url=http://doras.dcu.ie/15663/1/cikm1079-bermingham.pdf}}</ref> because it is easier to filter out the noise in a short-form text. For the long-form text, the growing length of the text does not always bring a proportionate increase in the number of features or sentiments in the text.

−

Researchers also found that long and short forms of user-generated text should be treated differently. An interesting result shows that short-form reviews are sometimes more helpful than long-form, because it is easier to filter out the noise in a short-form text. For the long-form text, the growing length of the text does not always bring a proportionate increase in the number of features or sentiments in the text.

研究人员还发现，用户生成的长文本和短文本应该区别对待。一个有趣的结果表明，短形式的评论有时比长形式的评论更有帮助，因为它更容易过滤掉短形式文本中的干扰。对于长篇文本，文本长度的增长并不总是带来文本中特征或情感数量的相应增加。

Lamba & Madhusudhan<ref>{{cite journal |last1=Lamba |first1=Manika |last2=Madhusudhan |first2=Margam |title=Application of sentiment analysis in libraries to provide temporal information service: a case study on various facets of productivity |journal=Social Network Analysis and Mining |year=2018 |volume=8 |issue=1|pages=1–12|doi=10.1007/s13278-018-0541-y |s2cid=53047128 }}</ref> introduce a nascent way to cater the information needs of today’s library users by repackaging the results from sentiment analysis of social media platforms like Twitter and provide it as a consolidated time-based service in different formats. Further, they propose a new way of conducting marketing in libraries using social media mining and sentiment analysis.

−

Lamba & Madhusudhan introduce a nascent way to cater the information needs of today’s library users by repackaging the results from sentiment analysis of social media platforms like Twitter and provide it as a consolidated time-based service in different formats. Further, they propose a new way of conducting marketing in libraries using social media mining and sentiment analysis.

Lamba & Madhusudhan 介绍了一种新的方法来满足当今图书馆用户的信息需求，方法是将 Twitter 等社交媒体平台的情绪分析结果重新打包，以不同的格式提供综合的基于时间的服务。此外，他们还提出了一种利用社会媒体挖掘和情感分析在图书馆进行营销的新方法。

第771行：第584行：

* [[Market sentiment]]

* [[Stylometry]]

−

* Emotion recognition

−

* Market sentiment

−

* Stylometry

Kuangmy

54

个编辑

更改

情感分析 (查看源代码)

2021年8月3日 (二) 21:04的版本

导航菜单

搜索