更改

情感分析 (查看源代码)

2021年7月20日 (二) 17:32的版本

添加60,857字节、 2021年7月20日 (二) 17:32

小

Moved page from wikipedia:en:Sentiment analysis (history)

第1行：第1行： −

~~此词条暂由彩云小译翻译，翻译字数共3167，未经人工整理和审校，带来阅读不便，请见谅。~~

+

此词条暂由彩云小译翻译，翻译字数共4529，未经人工整理和审校，带来阅读不便，请见谅。

−

'''Sentiment analysis''' (also known as '''opinion mining''' or '''emotion AI''') is the use of [[natural language processing]], [[Text analytics|text analysis]], [[computational linguistics]], and [[biometrics]] to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to [[voice of the customer]] materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from [[marketing]] to [[Customer relationship management|customer service]] to clinical medicine.

第10行：第8行：

情感分析(又称意见挖掘或情感人工智能)是利用自然语言处理、文本分析、计算语言学分析和生物特征识别技术系统地识别、提取、量化和研究情感状态和主观信息。情感分析被广泛应用于客户材料的声音，如评论和调查回应，在线和社交媒体，以及从市场营销到客户服务到临床医学的各种应用的医疗材料。

−

== Examples ==

−

The objective and challenges of sentiment analysis can be shown through some simple examples.

第21行：第16行：

情感分析的目的和挑战可以通过一些简单的例子来说明。

−

+

=== Simple cases ===

+

= = = 简单案例 = =

+

* Coronet has the best lines of all day cruisers.

+

* Bertram has a deep V hull and runs easily through seas.

+

* Pastel-colored 1980s day cruisers from Florida are ugly.

+

* I dislike old [[cabin cruiser]]s.

* Coronet has the best lines of all day cruisers.

−

* Bertram has a deep V hull and runs easily through seas.

−

* Pastel-colored 1980s day cruisers from Florida are ugly.

+

* I dislike old cabin cruisers.

−

* I dislike old [[cabin cruiser]]s.

+

* Coronet 拥有全天巡洋舰中最好的线路。

+

* 伯特伦船身深 v 型，可以轻松通过大海。

+

* 20世纪80年代来自佛罗里达州的彩色日间巡洋舰相当丑陋。

+

* 我不喜欢旧的游艇。

+

=== More challenging examples ===

−

+

= = 更具挑战性的例子 = =

* I do not dislike cabin cruisers. ([[Negation]] handling)

−

* Disliking watercraft is not really my thing. (Negation, inverted [[word order]])

−

* Sometimes I really hate [[Rigid-hulled inflatable boat|RIBs]]. ([[Adverbial]] modifies the sentiment)

−

* I'd really truly love going out in this weather! (Possibly [[sarcastic]])

−

* Chris Craft is better looking than Limestone. (Two [[brand name]]s, identifying the target of attitude is difficult).

+

* Chris Craft is better looking than Limestone, but Limestone projects seaworthiness and reliability. (Two attitudes, two brand names).

+

* The movie is surprising with plenty of unsettling plot twists. (Negative term used in a positive sense in certain domains).

+

* You should see their decadent dessert menu. (Attitudinal term has shifted polarity recently in certain domains)

+

* I love my mobile but would not recommend it to any of my colleagues. (Qualified positive sentiment, difficult to categorise)

+

* Next week's gig will be right koide9! ("Quoi de neuf?", French for "what's new?". Newly minted terms can be highly attitudinal but volatile in polarity and often out of known vocabulary.)

+

* I do not dislike cabin cruisers. (Negation handling)

+

* Disliking watercraft is not really my thing. (Negation, inverted word order)

+

* Sometimes I really hate RIBs. (Adverbial modifies the sentiment)

+

* I'd really truly love going out in this weather! (Possibly sarcastic)

+

* Chris Craft is better looking than Limestone. (Two brand names, identifying the target of attitude is difficult).

* Chris Craft is better looking than Limestone, but Limestone projects seaworthiness and reliability. (Two attitudes, two brand names).

−

* The movie is surprising with plenty of unsettling plot twists. (Negative term used in a positive sense in certain domains).

−

* You should see their decadent dessert menu. (Attitudinal term has shifted polarity recently in certain domains)

−

* I love my mobile but would not recommend it to any of my colleagues. (Qualified positive sentiment, difficult to categorise)

−

* Next week's gig will be right koide9! ("Quoi de neuf?", French for "what's new?". Newly minted terms can be highly attitudinal but volatile in polarity and often out of known vocabulary.)

+

* 我并不讨厌乘坐游轮。(否定处理)

+

* 不喜欢船只不是我真正的爱好。(否定，倒置的词序)

+

* 有时候我真的很讨厌肋骨。(状语修饰感情)

+

* 我真的很喜欢在这种天气出去！(可能是讽刺)

+

* 克里斯 · 克拉夫特比石灰石好看。(两个品牌名称，确定目标的态度是困难的)。

+

* 克里斯 · 克拉夫特比石灰石好看，但石灰石可以提高适航性和可靠性。(两种态度，两个品牌)。

+

* 这部电影情节曲折，令人惊讶。(在某些领域中以正面意义使用的否定词)。你应该看看他们的甜点菜单。(态度术语最近在某些领域改变了极性)

+

* 我喜欢自己的手机，但不会向任何同事推荐。(合格的积极情绪，很难归类)

+

* 下周的演出将是正确的 koide9！(“ Quoi de neuf？”法语”最新消息”。新出现的词汇态度强烈，但极性不稳定，常常不在已知词汇中。)

== Types ==

+

== Types ==

+

= = = =

A basic task in sentiment analysis is classifying the ''polarity'' of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as enjoyment, anger, disgust, sadness, fear, and surprise. <ref> Vong Anh Ho, Duong Huynh-Cong Nguyen, Danh Hoang Nguyen, Linh Thi-Van Pham, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen. "Emotion Recognition

−

A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as enjoyment, anger, disgust, sadness, fear, and surprise.

−

情感分析的一个基本任务就是在文档、句子或者特征/方面层面上对给定文本的极性进行分类ーー文档、句子或者实体特征/方面表达的意见是正面的、负面的还是中性的。先进的“超极性”情绪分类研究，例如，在情绪状态，如享受，愤怒，厌恶，悲伤，恐惧，和惊讶。

−

for Vietnamese Social Media Text". In Proceedings of the 2019 International Conference of the Pacific Association for Computational Linguistics (PACLING 2019), Hanoi, Vietnam (2019).</ref>

+

A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as enjoyment, anger, disgust, sadness, fear, and surprise. Vong Anh Ho, Duong Huynh-Cong Nguyen, Danh Hoang Nguyen, Linh Thi-Van Pham, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen. "Emotion Recognition

+

for Vietnamese Social Media Text". In Proceedings of the 2019 International Conference of the Pacific Association for Computational Linguistics (PACLING 2019), Hanoi, Vietnam (2019).

−

+

情感分析的一个基本任务就是在文档、句子或者特征/方面层面上对给定文本的极性进行分类ーー文档、句子或者实体特征/方面表达的意见是正面的、负面的还是中性的。先进的“超极性”情绪分类研究，例如，在情绪状态，如享受，愤怒，厌恶，悲伤，恐惧，和惊讶。Vong Anh Ho, Duong Huynh-Cong Nguyen, Danh Hoang Nguyen, Linh Thi-Van Pham, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen.「越南社交媒体文字的情绪认知」。在《2019年太平洋计算机语言学协会国际会议论文集》(PACLING 2019)中，越南河内(2019)。

−

~~Precursors to sentimental analysis include the General Inquirer~~, ~~which provided hints toward quantifying patterns in text and~~, ~~separately~~, ~~psychological research that examined a person's psychological state based on analysis of their verbal behavior~~.

−

情感分析的先驱包括《一般询问者》，它提供了对文本中的量化模式的线索，以及单独的心理学研究，它基于对人们言语行为的分析来检查人们的心理状态。

Precursors to sentimental analysis include the General Inquirer,<ref>Stone, Philip J., Dexter C. Dunphy, and Marshall S. Smith. "The general inquirer: A computer approach to content analysis." MIT Press, Cambridge, MA (1966).</ref> which provided hints toward quantifying patterns in text and, separately, psychological research that examined a person's [[psychological state]] based on analysis of their verbal behavior.<ref>Gottschalk, Louis August, and Goldine C. Gleser. The measurement of psychological states through the content analysis of verbal behavior. Univ of California Press, 1969.</ref>

+

Precursors to sentimental analysis include the General Inquirer,Stone, Philip J., Dexter C. Dunphy, and Marshall S. Smith. "The general inquirer: A computer approach to content analysis." MIT Press, Cambridge, MA (1966). which provided hints toward quantifying patterns in text and, separately, psychological research that examined a person's psychological state based on analysis of their verbal behavior.Gottschalk, Louis August, and Goldine C. Gleser. The measurement of psychological states through the content analysis of verbal behavior. Univ of California Press, 1969.

−

+

情感分析的先驱包括总询问者，斯通，菲利普 j. ，德克斯特 c. 邓菲，和马歇尔 s. 史密斯。一般询问者: 内容分析的计算机方法麻省理工学院出版社，剑桥，麻省理工学院(1966)。这为文本中的量化模式提供了线索，另外还有心理学研究，通过分析一个人的言语行为来检验他的心理状态。戈特沙尔克，路易斯 · 奥古斯特，戈尔丁 · c · 格莱泽。通过言语行为的内容分析测量心理状态。加州大学出版社，1969年。

−

~~Subsequently, the method described in a patent by Volcani and Fogel, looked specifically at sentiment and identified individual words and phrases in text with respect to different emotional scales~~. ~~A current system based on their work, called EffectCheck, presents synonyms that can be used to increase or decrease the level of evoked emotion in each scale~~.

−

随后，火山和福格尔在一项专利中描述的方法，专门研究了情感，并根据不同的情感尺度识别了文本中的单个单词和短语。一个基于他们的工作的现行系统，称为 EffectCheck，提出了同义词，可以用来增加或减少在每个规模的诱发情绪的水平。

Subsequently, the method described in a patent by Volcani and Fogel,<ref>{{cite patent

−

| country = USA

−

Many other subsequent efforts were less sophisticated, using a mere polar view of sentiment, from positive to negative, such as work by Turney, who applied different methods for detecting the polarity of product reviews and movie reviews respectively. This work is at the document level. One can also classify a document's polarity on a multi-way scale, which was attempted by Pang and Snyder among others: Pang and Lee

−

后来的许多努力都没有那么复杂，仅仅使用了从正面到负面的情绪极性视角，比如特尼的工作，他分别使用了不同的方法来检测产品评论和电影评论的极性。这项工作是在文档级别进行的。人们还可以在多方面的尺度上对文件的极性进行分类，庞和斯奈德等人曾尝试过这种方法

−

| number = 7,136,877

−

| status = Issued

+

| title = System and method for determining and controlling the impact of text

+

| pubdate = June 28, 2001

+

| gdate =

+

| fdate =

+

| pridate =

+

| inventor = Volcani, Yanon;

+

| invent1 =

+

| invent2 = Fogel, David B.

+

| assign1 =

+

| assign2 =

+

| class =

+

| url = http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=2&p=1&f=G&l=50&d=PTXT&S1=(fogel.INNM.+AND+volcani.INNM.)&OS=in/fogel+and+in/volcani&RS=(IN/fogel+AND+IN/volcani)

+

}}</ref> looked specifically at sentiment and identified individual words and phrases in text with respect to different emotional scales. A current system based on their work, called EffectCheck, presents synonyms that can be used to increase or decrease the level of evoked emotion in each scale.

−

~~Even though in most statistical classification methods, the neutral class is ignored under the assumption that neutral texts lie near the boundary of the binary classifier~~, several researchers suggest that, as in every polarity problem, three categories must be identified. Moreover, it can be proven that specific classifiers such as the Max Entropy and SVMs can benefit from the introduction of a neutral class and improve the ~~overall accuracy of the classification. There are~~ in ~~principle two ways for operating with~~ a ~~neutral class. Either, the algorithm proceeds~~ by ~~first identifying the neutral language~~, ~~filtering it out~~ and ~~then assessing the rest in terms of positive~~ and ~~negative sentiments, or it builds a three-way classification~~ in ~~one step~~. ~~This second approach often involves estimating a probability distribution over all categories (e.g. naive Bayes classifiers as implemented by the NLTK). Whether and how to use a neutral class depends~~ on ~~the nature of the data: if the data is clearly clustered into neutral~~, ~~negative and positive language~~, ~~it makes sense~~ to ~~filter~~ the ~~neutral language out and focus on the polarity between positive and negative sentiments. If,~~ in ~~contrast, the data are mostly neutral with small deviations towards positive and negative affect, this strategy would make it harder to clearly distinguish between the two poles~~.

+

Subsequently, the method described in a patent by Volcani and Fogel, looked specifically at sentiment and identified individual words and phrases in text with respect to different emotional scales. A current system based on their work, called EffectCheck, presents synonyms that can be used to increase or decrease the level of evoked emotion in each scale.

−

尽管在大多数分类分类方法中，中性类在假设二进制分类器的边界附近被忽略，一些研究人员建议，在每个极性问题中，必须识别3个类别。此外，还证明了引入中性分类器可以提高分类器的整体准确率，从而使最大熵和支持向量机等特定分类器得到更好的分类效果。原则上有两种操作中立类的方法。要么，算法首先识别中性语言，过滤掉它，然后根据积极和消极情绪评估其余的语言，要么在一个步骤中建立一个三向分类。第二种方法通常涉及到对所有类别的概率分布进行估计(例如:。NLTK 实现的朴素贝叶斯分类器)。是否以及如何使用中性类取决于数据的性质: 如果数据被清晰地分类为中性、消极和积极的语言，那么过滤掉中性语言并关注积极和消极情绪之间的极性是有意义的。相比之下，如果数据大部分是中性的，对积极和消极影响的偏差很小，这种策略将使清楚地区分两极变得更加困难。

+

随后，火山和福格尔在一项专利中描述的方法，专门研究了情感，并根据不同的情感尺度识别了文本中的单个单词和短语。一个基于他们的工作的现行系统，称为 EffectCheck，提出了同义词，可以用来增加或减少在每个规模的诱发情绪的水平。

−

| title = ~~System~~ and ~~method~~ for ~~determining~~ and ~~controlling~~ the ~~impact~~ of ~~text~~

+

Many other subsequent efforts were less sophisticated, using a mere polar view of sentiment, from positive to negative, such as work by Turney,<ref name = "Turney02" /> and Pang<ref name = "PangAl02">

+

{{cite conference

+

| first1 = Bo | last1 = Pang

+

| first2 = Lillian | last2 = Lee | author2-link = Lillian Lee (computer scientist)

+

| first3 = Shivakumar | last3 = Vaithyanathan

+

| title = Thumbs up? Sentiment Classification using Machine Learning Techniques

+

| book-title = Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)

+

| year = 2002

+

| pages = 79–86

+

| url = http://www.cs.cornell.edu/home/llee/papers/sentiment.home.html

+

}}

+

</ref> who applied different methods for detecting the polarity of [[product review]]s and movie reviews respectively. This work is at the document level. One can also classify a document's polarity on a multi-way scale, which was attempted by Pang<ref name = "PangLee05">

+

{{cite conference

+

| first1 = Bo | last1 = Pang

+

| first2 = Lillian | last2 = Lee

+

| title = Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales

+

| book-title = Proceedings of the Association for Computational Linguistics (ACL)

+

| year = 2005

+

| pages = 115–124

+

| url = http://www.cs.cornell.edu/home/llee/papers/pang-lee-stars.home.html

+

}}

+

</ref> and Snyder<ref name = "SnyderBarzilay07">

+

{{cite conference

+

| first1 = Benjamin | last1 = Snyder

+

| first2 = Regina | last2 = Barzilay

+

| title = Multiple Aspect Ranking using the Good Grief Algorithm

+

| book-title = Proceedings of the Joint Human Language Technology/North American Chapter of the ACL Conference (HLT-NAACL)

+

| year = 2007

+

| pages = 300–307

+

| url = http://people.csail.mit.edu/regina/my_papers/ggranker.ps

+

}}

+

</ref> among others: Pang and Lee<ref name = "PangLee05" /> expanded the basic task of classifying a movie review as either positive or negative to predict star ratings on either a 3- or a 4-star scale, while Snyder<ref name = "SnyderBarzilay07" /> performed an in-depth analysis of restaurant reviews, predicting ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-star scale).

−

~~| pubdate = June 28~~, ~~2001~~

+

Many other subsequent efforts were less sophisticated, using a mere polar view of sentiment, from positive to negative, such as work by Turney, and Pang

−

A different ~~method~~ for ~~determining sentiment is~~ the ~~use~~ of a scaling system whereby words commonly associated with having a negative, neutral, or positive sentiment with them are given an associated number on a −10 to +10 scale (most negative up to most positive) or simply from 0 to a positive upper limit such as +4. This ~~makes it possible to adjust the sentiment of a given term relative to its environment (usually on~~ the level ~~of the sentence)~~. ~~When~~ a ~~piece of unstructured text is analyzed using natural language processing, each concept in the specified environment is given~~ a ~~score based on the~~ way sentiment words relate to the concept and its associated score. This allows movement to a more sophisticated understanding of sentiment, because it is now possible to adjust the sentiment value of a concept relative to modifications that may surround it. Words, for example, that intensify, ~~relax or negate the sentiment expressed~~ by the concept can affect its score. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text.

+

who applied different methods for detecting the polarity of product reviews and movie reviews respectively. This work is at the document level. One can also classify a document's polarity on a multi-way scale, which was attempted by Pang

−

另一种确定情绪的方法是使用一种比例系统，根据这种比例系统，通常与负面、中性或正面情绪相关的词在 -10到 + 10的范围内(大多数从负面到最正面)或从0到正面上限(如 + 4)被赋予一个相关的数字。这样就可以根据环境调整特定术语的情绪(通常是在句子的层面上)。当使用自然语言处理对一篇非结构化文本进行分析时，基于情感词与概念及其相关得分的关系，给特定环境中的每个概念打分。这使得人们可以对情绪有更深入的理解，因为现在可以调整一个概念的情绪价值，相对于它周围可能发生的变化。例如，强化、放松或否定概念所表达的情感的词语会影响它的得分。或者，如果文本的目标是确定文本中的情绪，而不是文本的总体极性和强度，那么文本可以给出积极和消极的情绪强度评分。

+

and Snyder

−

~~| gdate =~~

+

among others: Pang and Lee expanded the basic task of classifying a movie review as either positive or negative to predict star ratings on either a 3- or a 4-star scale, while Snyder performed an in-depth analysis of restaurant reviews, predicting ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-star scale).

−

~~| fdate =~~

+

随后的许多努力都没有那么复杂，仅仅使用了从正面到负面的情绪极性视角，比如特尼和彭日成分别使用了不同的方法来检测产品评论和电影评论的极性。这项工作是在文档级别进行的。人们还可以在多方面的尺度上对文件的极性进行分类，彭日成和斯奈德等人曾尝试这样做: 彭日成和李拓展了将电影评论分为正面或负面的基本任务，以3星或4星的尺度预测明星评级，而斯奈德对餐馆评论进行了深入分析，预测特定餐馆的各个方面的评级，例如食物和氛围(以五星的尺度)。

−

~~There are~~ various other ~~types of~~ sentiment ~~analysis like- Aspect Based sentiment analysis~~, ~~Grading sentiment analysis (positive~~,~~negative~~,~~neutral~~), ~~Multilingual sentiment analysis and detection of emotions~~.

+

First steps to bringing together various approaches—learning, lexical, knowledge-based, etc.—were taken in the 2004 [[AAAI]] Spring Symposium where linguists, computer scientists, and other interested researchers first aligned interests and proposed shared tasks and benchmark data sets for the systematic computational research on affect, appeal, subjectivity, and sentiment in text.<ref>Qu, Yan, James Shanahan, and Janyce Wiebe. "Exploring attitude and affect in text: Theories and applications." In AAAI Spring Symposium) Technical report SS-04-07. AAAI Press, Menlo Park, CA. 2004.</ref>

−

~~还有各种其他类型的情感分析，如基于方面的情感分析，分级情感分析(积极的，消极的，中性的~~) ~~，多语言情感分析和情感检测。~~

+

First steps to bringing together various approaches—learning, lexical, knowledge-based, etc.—were taken in the 2004 AAAI Spring Symposium where linguists, computer scientists, and other interested researchers first aligned interests and proposed shared tasks and benchmark data sets for the systematic computational research on affect, appeal, subjectivity, and sentiment in text.Qu, Yan, James Shanahan, and Janyce Wiebe. "Exploring attitude and affect in text: Theories and applications." In AAAI Spring Symposium) Technical report SS-04-07. AAAI Press, Menlo Park, CA. 2004.

−

~~| pridate =~~

+

2004年美国科学促进会春季研讨会上，语言学家、计算机科学家和其他感兴趣的研究人员首次将各种方法——学习、词汇、基于知识等——结合起来，提出了共享任务和基准数据集，用于对文本 qu、 Yan、 James shanawe 和 janice wice 的影响、吸引力、主观性和情感的系统计算研究。探索文本中的态度和情感: 理论和应用在 AAAI 春季研讨会)技术报告 SS-04-07。美国科学促进协会出版社，门洛帕克。2004.

−

| ~~inventor~~ = ~~Volcani~~, ~~Yanon;~~

+

Even though in most statistical classification methods, the neutral class is ignored under the assumption that neutral texts lie near the boundary of the binary classifier, several researchers suggest that, as in every polarity problem, three categories must be identified. Moreover, it can be proven that specific classifiers such as the [[Maximum entropy probability distribution|Max Entropy]]<ref name = "Vryniotis13">

+

{{cite conference

+

| first = Vasilis | last = Vryniotis

+

| title = The importance of Neutral Class in Sentiment Analysis

+

| year = 2013

+

| url = http://blog.datumbox.com/the-importance-of-neutral-class-in-sentiment-analysis/

+

}}

+

</ref> and [[Support vector machine|SVMs]]<ref name = "KoppelSchler06">

+

{{cite conference

+

| first1 = Moshe | last1 = Koppel

+

| first2 = Jonathan | last2 = Schler

+

| title = The Importance of Neutral Examples for Learning Sentiment

+

| book-title = Computational Intelligence 22

+

| year = 2006

+

| pages = 100–109

+

| citeseerx = 10.1.1.84.9735

+

}}

+

</ref> can benefit from the introduction of a neutral class and improve the overall accuracy of the classification. There are in principle two ways for operating with a neutral class. Either, the algorithm proceeds by first identifying the neutral language, filtering it out and then assessing the rest in terms of positive and negative sentiments, or it builds a three-way classification in one step.<ref>{{Cite journal|last1=Ribeiro|first1=Filipe Nunes|last2=Araujo|first2=Matheus|date=2010|title=A Benchmark Comparison of State-of-the-Practice Sentiment Analysis Methods|url=https://www.researchgate.net/publication/286302059|journal=Transactions on Embedded Computing Systems |volume=9 |issue=4}}</ref> This second approach often involves estimating a probability distribution over all categories (e.g. [[Naive Bayes classifier|naive Bayes]] classifiers as implemented by the [[Nltk|NLTK]]). Whether and how to use a neutral class depends on the nature of the data: if the data is clearly clustered into neutral, negative and positive language, it makes sense to filter the neutral language out and focus on the polarity between positive and negative sentiments. If, in contrast, the data are mostly neutral with small deviations towards positive and negative affect, this strategy would make it harder to clearly distinguish between the two poles.

−

~~| invent1 =~~

+

Even though in most statistical classification methods, the neutral class is ignored under the assumption that neutral texts lie near the boundary of the binary classifier, several researchers suggest that, as in every polarity problem, three categories must be identified. Moreover, it can be proven that specific classifiers such as the Max Entropy

−

This task is commonly defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective. This problem can sometimes be more difficult than polarity classification. The subjectivity of words and phrases may depend on their context and an objective document may contain subjective sentences (e.g., a news article quoting people's opinions). Moreover, as mentioned by Su, results are largely dependent on the definition of subjectivity used when annotating texts. However, Pang showed that removing objective sentences from a document before classifying its polarity helped improve performance.

+

and SVMs

−

~~这个任务通常被定义为将一个给定的文本~~(~~通常是一个句子~~)~~分成两类~~: 客观的或主观的。这个问题有时比极性分类更难解决。词汇和短语的主观性可能取决于它们的上下文，客观文件可能包含主观句子(例如，一篇引用人们观点的新闻文章)。此外，正如苏所提到的，结果在很大程度上依赖于注释文本时使用的主观性的定义。然而，彭日成表示，在对文件进行分类之前，去掉文件中的客观句子有助于提高表现。

+

can benefit from the introduction of a neutral class and improve the overall accuracy of the classification. There are in principle two ways for operating with a neutral class. Either, the algorithm proceeds by first identifying the neutral language, filtering it out and then assessing the rest in terms of positive and negative sentiments, or it builds a three-way classification in one step. This second approach often involves estimating a probability distribution over all categories (e.g. naive Bayes classifiers as implemented by the NLTK). Whether and how to use a neutral class depends on the nature of the data: if the data is clearly clustered into neutral, negative and positive language, it makes sense to filter the neutral language out and focus on the polarity between positive and negative sentiments. If, in contrast, the data are mostly neutral with small deviations towards positive and negative affect, this strategy would make it harder to clearly distinguish between the two poles.

−

~~| invent2 = Fogel, David B.~~

+

尽管在大多数分类分类方法中，中性类在假设二进制分类器的边界附近被忽略，一些研究人员建议，在每个极性问题中，必须识别3个类别。此外，还证明了引入中性分类器可以有效地提高分类器的整体准确率，如最大熵和支持向量机。原则上有两种操作中立类的方法。要么，算法首先识别中性语言，过滤掉它，然后根据积极和消极情绪评估其余的语言，要么在一个步骤中建立一个三向分类。第二种方法通常涉及到对所有类别的概率分布进行估计(例如:。NLTK 实现的朴素贝叶斯分类器)。是否以及如何使用中性类取决于数据的性质: 如果数据被清晰地分类为中性、消极和积极的语言，那么过滤掉中性语言并关注积极和消极情绪之间的极性是有意义的。相比之下，如果数据大部分是中性的，对积极和消极影响的偏差很小，这种策略将使清楚地区分两极变得更加困难。

−

| ~~assign1~~ =

+

A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral, or positive sentiment with them are given an associated number on a −10 to +10 scale (most negative up to most positive) or simply from 0 to a positive upper limit such as +4. This makes it possible to adjust the sentiment of a given term relative to its environment (usually on the level of the sentence). When a piece of unstructured text is analyzed using [[natural language processing]], each concept in the specified environment is given a score based on the way sentiment words relate to the concept and its associated score.<ref>{{Cite journal|last1=Taboada|first1=Maite|last2=Brooke|first2=Julian|date=2011|title=Lexicon-based methods for sentiment analysis|url=http://dl.acm.org/citation.cfm?id=2000518|journal=Computational Linguistics |volume=37 |issue=2 |pages=272–274|doi=10.1162/coli_a_00049|citeseerx=10.1.1.188.5517|s2cid=3181362}}</ref><ref>{{Cite journal|last1=Augustyniak|first1=Łukasz|last2=Szymański|first2=Piotr|last3=Kajdanowicz|first3=Tomasz|last4=Tuligłowicz|first4=Włodzimierz|date=2015-12-25|title=Comprehensive Study on Lexicon-based Ensemble Classification Sentiment Analysis|journal=Entropy|language=en|volume=18|issue=1|pages=4|doi=10.3390/e18010004|bibcode=2015Entrp..18....4A|doi-access=free}}</ref><ref>{{Cite journal|last1=Mehmood|first1=Yasir|last2=Balakrishnan|first2=Vimala|date=2020-01-01|title=An enhanced lexicon-based approach for sentiment analysis: a case study on illegal immigration|url=https://doi.org/10.1108/OIR-10-2018-0295|journal=Online Information Review|volume=44|issue=5|pages=1097–1117|doi=10.1108/OIR-10-2018-0295|issn=1468-4527}}</ref> This allows movement to a more sophisticated understanding of sentiment, because it is now possible to adjust the sentiment value of a concept relative to modifications that may surround it. Words, for example, that intensify, relax or negate the sentiment expressed by the concept can affect its score. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text.<ref name ="SentiStrength2010">

+

{{cite journal

+

| first1 = Mike

+

| last1 = Thelwall

+

| first2 = Kevan

+

| last2 = Buckley

+

| first3 = Georgios

+

| last3 = Paltoglou

+

| first4 = Di

+

| last4 = Cai

+

| first5 = Arvid

+

| last5 = Kappas

+

| title = Sentiment strength detection in short informal text

+

| year = 2010

+

| journal = Journal of the American Society for Information Science and Technology

+

| volume= 61

+

| issue= 12

+

| pages= 2544–2558

+

| url = http://www.scit.wlv.ac.uk/~cm1993/papers/SentiStrengthPreprint.doc

+

| doi=10.1002/asi.21416

+

| citeseerx = 10.1.1.278.3863

+

}}

+

</ref>

−

~~| assign2 =~~

+

A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral, or positive sentiment with them are given an associated number on a −10 to +10 scale (most negative up to most positive) or simply from 0 to a positive upper limit such as +4. This makes it possible to adjust the sentiment of a given term relative to its environment (usually on the level of the sentence). When a piece of unstructured text is analyzed using natural language processing, each concept in the specified environment is given a score based on the way sentiment words relate to the concept and its associated score. This allows movement to a more sophisticated understanding of sentiment, because it is now possible to adjust the sentiment value of a concept relative to modifications that may surround it. Words, for example, that intensify, relax or negate the sentiment expressed by the concept can affect its score. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text.

−

~~| class =~~

−

~~The term objective refers to the incident carry factual information.~~

−

~~客观这个术语指的是携带事实信息的事件。~~

+

另一种确定情绪的方法是使用一种比例系统，根据这种比例系统，通常与负面、中性或正面情绪相关的词在 -10到 + 10的范围内(大多数从负面到最正面)或从0到正面上限(如 + 4)被赋予一个相关的数字。这样就可以根据环境调整特定术语的情绪(通常是在句子的层面上)。当使用自然语言处理对一篇非结构化文本进行分析时，基于情感词与概念及其相关得分的关系，给特定环境中的每个概念打分。这使得人们可以对情绪有更深入的理解，因为现在可以调整一个概念的情绪价值，相对于它周围可能发生的变化。例如，强化、放松或否定概念所表达的情感的词语会影响它的得分。或者，如果文本的目标是确定文本中的情绪，而不是文本的总体极性和强度，那么文本可以给出积极和消极的情绪强度评分。

−

~~| url = http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch~~-~~adv.htm&r=2&p=1&f=G&l=50&d=PTXT&S1=~~(~~fogel.INNM.+AND+volcani.INNM.~~)~~&OS=in/fogel+~~and~~+in/volcani&RS=(IN/fogel+AND+IN/volcani)~~

+

There are various other types of sentiment analysis like- Aspect Based sentiment analysis, Grading sentiment analysis (positive,negative,neutral), Multilingual sentiment analysis and detection of emotions.

−

~~}}</ref> looked specifically at~~ sentiment ~~and identified individual words and phrases in text with respect to different emotional scales. A current system based on their work~~, ~~called EffectCheck~~, ~~presents synonyms that can be used to increase or decrease the level~~ of ~~evoked emotion in each scale~~.

+

There are various other types of sentiment analysis like- Aspect Based sentiment analysis, Grading sentiment analysis (positive,negative,neutral), Multilingual sentiment analysis and detection of emotions.

+

还有各种其他类型的情感分析，如基于方面的情感分析，分级情感分析(积极的，消极的，中性的) ，多语言情感分析和情感检测。

+

=== Subjectivity/objectivity identification ===

+

This task is commonly defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective.<ref name="PangLee08Subjectivity">{{cite book

+

| first1 = Bo

+

| last1 = Pang

+

| first2 = Lillian

+

| last2 = Lee

+

| title = Opinion Mining and Sentiment Analysis

+

| year = 2008

+

| publisher = Now Publishers Inc

+

| chapter = 4.1.2 Subjectivity Detection and Opinion Identification

+

| chapter-url = http://www.cs.cornell.edu/home/llee/opinion-mining-sentiment-analysis-survey.html

+

}}

+

</ref> This problem can sometimes be more difficult than polarity classification.<ref name="MihalceaAl07">{{cite conference

+

|first1 = Rada

+

|last1 = Mihalcea

+

|first2 = Carmen

+

|last2 = Banea

+

|first3 = Janyce

+

|last3 = Wiebe

+

|title = Learning Multilingual Subjective Language via Cross-Lingual Projections

+

|book-title = Proceedings of the Association for Computational Linguistics (ACL)

+

|year = 2007

+

|pages = 976–983

+

|url = http://www.cse.unt.edu/~rada/papers/mihalcea.acl07.pdf

+

|url-status = dead

+

|archive-url = https://web.archive.org/web/20100708065222/http://www.cse.unt.edu/~rada/papers/mihalcea.acl07.pdf

+

|archive-date = 2010-07-08

+

}}

+

</ref> The subjectivity of words and phrases may depend on their context and an objective document may contain subjective sentences (e.g., a news article quoting people's opinions). Moreover, as mentioned by Su,<ref name="SuMarkert08">{{cite conference

+

| first1 = Fangzhong | last1 = Su

+

| first2 = Katja | last2 = Markert

+

| title = From Words to Senses: a Case Study in Subjectivity Recognition

+

| book-title = Proceedings of Coling 2008, Manchester, UK

+

| year = 2008

+

| url = http://www.comp.leeds.ac.uk/markert/Papers/Coling2008.pdf

+

}}

+

</ref> results are largely dependent on the definition of subjectivity used when annotating texts. However, Pang<ref name="PangLee04">{{cite conference

+

| first1 = Bo | last1 = Pang

+

| first2 = Lillian | last2 = Lee

+

| title = A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

+

| book-title = Proceedings of the Association for Computational Linguistics (ACL)

+

| year = 2004

+

| pages = 271–278

+

| url = http://www.cs.cornell.edu/home/llee/papers/cutsent.home.html

+

}}

+

</ref> showed that removing objective sentences from a document before classifying its polarity helped improve performance.

−

~~Many other subsequent efforts were less sophisticated, using~~ a ~~mere polar view~~ of ~~sentiment~~, ~~from positive to negative~~, ~~such~~ as ~~work~~ by ~~Turney~~,~~<ref name = "Turney02" /> and~~ Pang~~<ref name = "PangAl02">~~

+

This task is commonly defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective.

+

This problem can sometimes be more difficult than polarity classification.

+

The subjectivity of words and phrases may depend on their context and an objective document may contain subjective sentences (e.g., a news article quoting people's opinions). Moreover, as mentioned by Su,

+

results are largely dependent on the definition of subjectivity used when annotating texts. However, Pang

+

showed that removing objective sentences from a document before classifying its polarity helped improve performance.

−

The term subjective describes the incident contains non-factual information in various forms, such as personal opinions, judgment, and predictions.Also known as 'private states' mentioned by Quirk et al. In the example down below, it reflects a private states 'We Americans'.  Moreover, the target entity commented by the opinions can take several forms from tangible product to intangible topic matters stated in Liu(~~2010~~). A dictionary of extraction rules has to be created for measuring given expressions. Over the years, in subjective detection, the features extraction progression from curating features by hands in 1999 to automated features learning in 2005. At the moment, automated learning methods can further separate into supervised and unsupervised machine learning. Patterns extraction with machine learning process annotated and unannotated text have been explored extensively by academic researchers.

+

这项任务通常被定义为将给定的文本(通常是一个句子)分为两类: 客观的或主观的。这个问题有时比极性分类更难解决。词汇和短语的主观性可能取决于它们的上下文，客观文件可能包含主观句子(例如，一篇引用人们观点的新闻文章)。此外，正如苏所提到的，结果在很大程度上依赖于注释文本时使用的主观性的定义。然而，彭日成表示，在对文件进行分类之前，去掉文件中的客观句子有助于提高表现。

−

主观这个术语描述的事件包含各种形式的非事实信息，如个人意见、判断和预测。也被称为私有状态。在下面的例子中，它反映了一个私人国家“我们美国人”。此外，被评论的目标实体可以采取从有形产品到刘(2010)所述无形话题事项的多种形式。必须创建一个抽取规则字典来度量给定的表达式。多年来，在主观检测方面，从1999年的手工特征提取到2005年的自动特征学习。目前，自动化学习方法可以进一步分为监督学习和非监督式学习学习。利用机器学习过程对文本进行注释和去注释的模式提取方法已经成为学术界研究的热点。

+

{{clarify-span|Subjective and objective identification, emerging subtasks of sentiment analysis to use syntactic, semantic features, and machine learning knowledge to identify a sentence or document are facts or opinions. Awareness of recognizing factual and opinions is not recent, having possibly first presented by Carbonell at Yale University in 1979.|date=December 2020}}

−

{{~~cite conference~~

+

The term objective refers to the incident carry factual information.<ref name="Wiebe 2005 486–497">{{Cite journal|last1=Wiebe|first1=Janyce|last2=Riloff|first2=Ellen|date=2005|editor-last=Gelbukh|editor-first=Alexander|title=Creating Subjective and Objective Sentence Classifiers from Unannotated Texts|url=https://link.springer.com/chapter/10.1007%2F978-3-540-30586-6_53|journal=Computational Linguistics and Intelligent Text Processing|series=Lecture Notes in Computer Science|volume=3406|language=en|location=Berlin, Heidelberg|publisher=Springer|pages=486–497|doi=10.1007/978-3-540-30586-6_53|isbn=978-3-540-30586-6}}</ref>

−

~~| first1 = Bo | last1 = Pang~~

+

The term objective refers to the incident carry factual information.

−

However, researchers recognized several challenges in developing fixed sets of rules for expressions respectably. Much of the challenges in rule development stems from the nature of textual information. Six challenges have been recognized by several researchers: 1) metaphorical expressions, 2) discrepancies in writings, 3) context-sensitive, 4) represented words with fewer usages, 5) time-sensitive, and 6) ever-growing volume.

+

客观这个术语指的是携带事实信息的事件。

−

然而，研究人员认识到，在为表达式制定一套固定的规则方面存在一些挑战。规则开发中的许多挑战源于文本信息的性质。一些研究人员已经认识到了六个挑战: ~~1)隐喻性的表达方式，2)写作中的差异，3)上下文敏感性，4)代表用法较少的单词，5)时间敏感性，以及6)不断增长的数量。~~

+

* Example of an objective sentence: 'To be elected president of the United States, a candidate must be at least thirty-five years of age.'

−

~~| first2 = Lillian | last2 = Lee | author2~~-~~link = Lillian Lee (computer scientist)~~

+

* Example of an objective sentence: 'To be elected president of the United States, a candidate must be at least thirty-five years of age.'

−

~~| first3 = Shivakumar | last3 = Vaithyanathan~~

−

Metaphorical expressions. The text contains metaphoric expression may impact on the performance on the extraction. Besides, metaphors take in different forms, which may have been contributed to the increase in detection.

+

* 客观句的例子:”要当选美国总统，候选人必须年满35岁。'

−

~~比喻性的表达。文本中包含的隐喻表达可能会影响抽取的性能。此外，隐喻采取不同的形式，这可能有助于增加检测。~~

+

The term subjective describes the incident contains non-factual information in various forms, such as personal opinions, judgment, and predictions.Also known as 'private states' mentioned by Quirk et al.<ref>{{Cite book|last1=Quirk|first1=Randolph|title=A Comprehensive Grammar of the English Language (General Grammar)|last2=Greenbaum|first2=Sidney|last3=Geoffrey|first3=Leech|last4=Jan|first4=Svartvik|publisher=[[Longman]]|year=1985|isbn=1933108312|pages=175–239}}</ref> In the example down below, it reflects a private states 'We Americans'.  Moreover, the target entity commented by the opinions can take several forms from tangible product to intangible topic matters stated in Liu(2010).<ref name="Liu2010" /> Furthermore, three types of attitudes were observed by Liu(2010), 1) positive opinions, 2) neutral opinions, and 3)negative opinions.<ref name="Liu2010" />

−

~~| title = Thumbs up? Sentiment Classification using Machine Learning Techniques~~

+

The term subjective describes the incident contains non-factual information in various forms, such as personal opinions, judgment, and predictions.Also known as 'private states' mentioned by Quirk et al. In the example down below, it reflects a private states 'We Americans'. Moreover, the target entity commented by the opinions can take several forms from tangible product to intangible topic matters stated in Liu(2010). Furthermore, three types of attitudes were observed by Liu(2010), 1) positive opinions, 2) neutral opinions, and 3)negative opinions.

−

~~Discrepancies in writings. For the text obtained from the Internet, the discrepancies in the writing style of targeted text data involve distinct writing genres and styles~~

+

主观这个术语描述的事件包含各种形式的非事实信息，如个人意见、判断和预测。也被称为私有状态。在下面的例子中，它反映了一个私人国家“我们美国人”。此外，被评论的目标实体可以采取从有形产品到刘(2010)所述无形话题事项的多种形式。此外，刘(2010)观察到三种态度: 1)积极的观点，2)中立的观点，3)消极的观点。

−

~~文字上的差异。对于从网络上获取的文本，目标文本数据的写作风格差异涉及到不同的写作类型和风格~~

+

* Example of a subjective sentence: 'We Americans need to elect a president who is mature and who is able to make wise decisions.'

+

This analysis is a classification problem.<ref name=":1" />

−

~~| book-title = Proceedings~~ of ~~the Conference on Empirical Methods in Natural Language Processing (EMNLP)~~

+

* Example of a subjective sentence: 'We Americans need to elect a president who is mature and who is able to make wise decisions.'

+

This analysis is a classification problem.

−

~~Context-sensitive. Classification may vary based on the subjectiveness or objectiveness of previous and following sentences.~~

−

~~上下文相关的。分类可以根据前面和后面句子的主观性或客观性而有所不同。~~

+

* 主观句的例子: 我们美国人需要选出一位成熟且能够做出明智决定的总统。'这种分析是一个分类问题。

−

| ~~year~~ = ~~2002~~

+

Each class's collections of words or phrase indicators are defined for to locate desirable patterns on unannotated text. For subjective expression, a different word list has been created. Lists of subjective indicators in words or phrases have been developed by multiple researchers in the linguist and natural language processing field states in Riloff et al.(2003).<ref>{{Cite journal|last1=Riloff|first1=Ellen|last2=Wiebe|first2=Janyce|date=2003-07-11|title=Learning extraction patterns for subjective expressions|journal=Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing|series=EMNLP '03|volume=10|location=USA|publisher=Association for Computational Linguistics|pages=105–112|doi=10.3115/1119355.1119369|s2cid=6541910|doi-access=free}}</ref> A dictionary of extraction rules has to be created for measuring given expressions. Over the years, in subjective detection, the features extraction progression from curating features by hands in 1999 to automated features learning in 2005.<ref>{{Cite journal|last1=Chaturvedi|first1=Iti|last2=Cambria|first2=Erik|last3=Welsch|first3=Roy E.|last4=Herrera|first4=Francisco|date=November 2018|title=Distinguishing between facts and opinions for sentiment analysis: Survey and challenges|url=https://sentic.net/subjectivity-detection.pdf|journal=Information Fusion|volume=44|pages=65–77|doi=10.1016/j.inffus.2017.12.006|via=Elsevier Science Direct|doi-access=free}}</ref> At the moment, automated learning methods can further separate into supervised and [[Unsupervised learning|unsupervised machine learning]]. Patterns extraction with machine learning process annotated and unannotated text have been explored extensively by academic researchers.

−

~~Time-sensitive attribute~~. ~~The task is challenged~~ by the ~~some textual data’s time-sensitive attribute~~. ~~If a group~~ of ~~researchers wants~~ to ~~confirm a piece of fact in~~ the ~~news~~, ~~they need a longer time for cross-validation~~, ~~than~~ the ~~news becomes outdated~~.

+

Each class's collections of words or phrase indicators are defined for to locate desirable patterns on unannotated text. For subjective expression, a different word list has been created. Lists of subjective indicators in words or phrases have been developed by multiple researchers in the linguist and natural language processing field states in Riloff et al.(2003). A dictionary of extraction rules has to be created for measuring given expressions. Over the years, in subjective detection, the features extraction progression from curating features by hands in 1999 to automated features learning in 2005. At the moment, automated learning methods can further separate into supervised and unsupervised machine learning. Patterns extraction with machine learning process annotated and unannotated text have been explored extensively by academic researchers.

−

时间敏感属性。该任务受到某些文本数据的时间敏感属性的挑战。如果一群研究人员想要在新闻中证实一个事实，他们需要更长的时间，比新闻变得过时更长的交叉验证。

+

每个类的单词或短语指示符的集合被定义用于在未注释的文本上定位理想的模式。对于主观表达，已经创建了一个不同的词表。单词或短语中的主观指标列表是由 Riloff 语言学家和自然语言处理领域的多名研究人员开发的。必须创建一个抽取规则字典来度量给定的表达式。多年来，在主观检测方面，从1999年的手工特征提取到2005年的自动特征学习。目前，自动化学习方法可以进一步分为监督学习和非监督式学习学习。利用机器学习过程对文本进行注释和去注释的模式提取方法已经成为学术界研究的热点。

−

~~| pages = 79–86~~

+

However, researchers recognized several challenges in developing fixed sets of rules for expressions respectably. Much of the challenges in rule development stems from the nature of textual information. Six challenges have been recognized by several researchers: 1) metaphorical expressions, 2) discrepancies in writings, 3) context-sensitive, 4) represented words with fewer usages, 5) time-sensitive, and 6) ever-growing volume.

−

~~Cue~~ words with fewer usages.

+

However, researchers recognized several challenges in developing fixed sets of rules for expressions respectably. Much of the challenges in rule development stems from the nature of textual information. Six challenges have been recognized by several researchers: 1) metaphorical expressions, 2) discrepancies in writings, 3) context-sensitive, 4) represented words with fewer usages, 5) time-sensitive, and 6) ever-growing volume.

−

~~提示用法较少的词。~~

+

然而，研究人员认识到，在为表达式制定一套固定的规则方面存在一些挑战。规则开发中的许多挑战源于文本信息的性质。一些研究人员已经认识到了六个挑战: 1)隐喻性的表达方式，2)写作中的差异，3)上下文敏感性，4)代表用法较少的单词，5)时间敏感性，以及6)不断增长的数量。

−

~~| url = http~~:~~//www.cs.cornell.edu/home/llee/papers/sentiment.home.html~~

−

Ever-growing volume. The task is also challenged by the sheer volume of textual data. The textual data's ever-growing nature makes the task overwhelmingly difficult for the researchers to complete the task on time.

+

# Metaphorical expressions. The text contains metaphoric expression may impact on the performance on the extraction.<ref>{{Cite journal|last1=Wiebe|first1=Janyce|last2=Riloff|first2=Ellen|date=July 2011|title=Finding Mutual Benefit between Subjectivity Analysis and Information Extraction|url=https://ieeexplore.ieee.org/document/5959154|journal=IEEE Transactions on Affective Computing|volume=2|issue=4|pages=175–191|doi=10.1109/T-AFFC.2011.19|s2cid=16820846|issn=1949-3045}}</ref> Besides, metaphors take in different forms, which may have been contributed to the increase in detection.

+

# Discrepancies in writings. For the text obtained from the Internet, the discrepancies in the writing style of targeted text data involve distinct writing genres and styles

+

# Context-sensitive. Classification may vary based on the subjectiveness or objectiveness of previous and following sentences.<ref name=":1">{{Cite journal|last1=Pang|first1=Bo|last2=Lee|first2=Lillian|date=2008-07-06|title=Opinion Mining and Sentiment Analysis|url=https://www.nowpublishers.com/article/Details/INR-011|journal=Foundations and Trends in Information Retrieval|language=en|volume=2|issue=1–2|pages=1–135|doi=10.1561/1500000011|issn=1554-0669}}</ref>

+

# Time-sensitive attribute. The task is challenged by the some textual data’s time-sensitive attribute. If a group of researchers wants to confirm a piece of fact in the news, they need a longer time for cross-validation, than the news becomes outdated.

+

# Cue words with fewer usages.

+

# Ever-growing volume. The task is also challenged by the sheer volume of textual data. The textual data's ever-growing nature makes the task overwhelmingly difficult for the researchers to complete the task on time.

−

~~不断增长的体积。这项任务还受到大量文本数据的挑战。文本数据的不断增长性使得研究人员很难按时完成任务。~~

+

# Metaphorical expressions. The text contains metaphoric expression may impact on the performance on the extraction. Besides, metaphors take in different forms, which may have been contributed to the increase in detection.

+

# Discrepancies in writings. For the text obtained from the Internet, the discrepancies in the writing style of targeted text data involve distinct writing genres and styles

+

# Context-sensitive. Classification may vary based on the subjectiveness or objectiveness of previous and following sentences.

+

# Time-sensitive attribute. The task is challenged by the some textual data’s time-sensitive attribute. If a group of researchers wants to confirm a piece of fact in the news, they need a longer time for cross-validation, than the news becomes outdated.

+

# Cue words with fewer usages.

+

# Ever-growing volume. The task is also challenged by the sheer volume of textual data. The textual data's ever-growing nature makes the task overwhelmingly difficult for the researchers to complete the task on time.

−

}}

+

# 比喻性的表达。文本中包含的隐喻表达可能会影响抽取的性能。此外，隐喻采取不同的形式，这可能有助于增加检测。# 文字上的差异。对于从互联网上获得的文本，目标文本数据的写作风格差异涉及不同的写作类型和风格 # 上下文敏感。分类可以根据前面和后面句子的主观性或客观性而有所不同。# 时间敏感属性。该任务受到某些文本数据的时间敏感属性的挑战。如果一群研究人员想要在新闻中证实一个事实，他们需要更长的时间，比新闻变得过时更长的交叉验证。# 暗示用词较少的词语。# 不断增长的数量。这项任务还受到大量文本数据的挑战。文本数据的不断增长性使得研究人员很难按时完成任务。

−

~~</ref> who applied different methods for detecting the polarity of [[product review]]s and movie reviews respectively. This work is at~~ the document level. ~~One can also classify~~ a document~~'s polarity on~~ a ~~multi-way scale~~, ~~which was attempted by Pang~~<ref name = "~~PangLee05~~">

+

Previously, the research mainly focused on document level classification. However, classifying a document level suffers less accuracy, as an article may have diverse types of expressions involved. Researching evidence suggests a set of news articles that are expected to dominate by the objective expression, whereas the results show that it consisted of over 40% of subjective expression.<ref name="Wiebe 2005 486–497"/>

Previously, the research mainly focused on document level classification. However, classifying a document level suffers less accuracy, as an article may have diverse types of expressions involved. Researching evidence suggests a set of news articles that are expected to dominate by the objective expression, whereas the results show that it consisted of over 40% of subjective expression.

第209行：第350行：

以往的研究主要集中在文档级别的分类上。然而，文档级别的分类准确性较低，因为一篇文章可能涉及不同类型的表达方式。研究证据表明，一组新闻文章被期望以客观表达为主，而研究结果表明，这组新闻文章占主观表达的40% 以上。

−

~~{{cite conference~~

+

To overcome those challenges, researchers conclude that classifier efficacy depends on the precisions of patterns learner. And the learner feeds with large volumes of annotated training data outperformed those trained on less comprehensive subjective features. However, one of the main obstacles to executing this type of work is to generate a big dataset of annotated sentences manually. The manual annotation method has been less favored than automatic learning for three reasons:

−

~~| first1 = Bo | last1 = Pang~~

+

To overcome those challenges, researchers conclude that classifier efficacy depends on the precisions of patterns learner. And the learner feeds with large volumes of annotated training data outperformed those trained on less comprehensive subjective features. However, one of the main obstacles to executing this type of work is to generate a big dataset of annotated sentences manually. The manual annotation method has been less favored than automatic learning for three reasons:

−

All these mentioned reasons can impact on the efficiency and effectiveness of subjective and objective classification. Accordingly, two bootstrapping methods were designed to learning linguistic patterns from unannotated text data. Both methods are starting with a handful of seed words and unannotated textual data.

+

为了克服这些挑战，研究人员得出结论，分类效能取决于模式学习者的精确度。而且，带有大量注释的训练数据的学习者饲料表现优于那些不太全面的主观特征的训练者。然而，执行此类工作的主要障碍之一是手动生成大量带注释的句子数据集。手动注释方法不如自动学习方法受欢迎，原因有三:

−

所有这些原因都会影响主客观分类的效率和有效性。相应地，设计了两种自举方法来从未注释的文本数据中学习语言模式。两种方法都以少量种子词和未注释的文本数据开始。

+

# Variations in comprehensions. In the manual annotation task, disagreement of whether one instance is subjective or objective may occur among annotators because of languages' ambiguity.

+

# Human errors. Manual annotation task is a meticulous assignment, it require intense concentration to finish.

+

# Time-consuming. Manual annotation task is an assiduious work. Riloff (1996) show that a 160 texts cost 8 hours for one annotator to finish.<ref>{{Cite journal|last=Riloff|first=Ellen|date=1996-08-01|title=An empirical study of automated dictionary construction for information extraction in three domains|url=https://dx.doi.org/10.1016%2F0004-3702%2895%2900123-9|journal=Artificial Intelligence|language=en|volume=85|issue=1|pages=101–134|doi=10.1016/0004-3702(95)00123-9|issn=0004-3702|doi-access=free}}</ref>

−

~~| first2 = Lillian | last2 = Lee~~

+

# Variations in comprehensions. In the manual annotation task, disagreement of whether one instance is subjective or objective may occur among annotators because of languages' ambiguity.

+

# Human errors. Manual annotation task is a meticulous assignment, it require intense concentration to finish.

+

# Time-consuming. Manual annotation task is an assiduious work. Riloff (1996) show that a 160 texts cost 8 hours for one annotator to finish.

−

~~| title = Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales~~

+

# 理解上的变化。在手工注释过程中，由于语言的模糊性，注释者之间可能会出现主观或客观实例的分歧。# 人为错误。手工注释是一项细致的工作，需要高度集中精力才能完成。# 费时。手工注释是一项繁重的工作。里洛夫(1996)表明，一个注释者完成160篇文本需要8个小时。

−

~~Meta-Bootstrapping by Riloff and Jones in 1999. Level One: Generate extraction patterns based~~ on the ~~pre-defined rules~~ and ~~the extracted~~ patterns ~~by the number~~ of seed words ~~each pattern holds. Leve Two: Top 5 words will be marked~~ and ~~add to the dictionary. Repeat~~.

+

All these mentioned reasons can impact on the efficiency and effectiveness of subjective and objective classification. Accordingly, two bootstrapping methods were designed to learning linguistic patterns from unannotated text data. Both methods are starting with a handful of seed words and unannotated textual data.

−

1999年里洛夫和琼斯的 Meta-Bootstrapping。第一级: 根据预定义的规则生成提取模式，并根据每个模式所包含的种子词数量生成提取模式。第二步: 前5个单词将被标记并添加到字典中。重复。

+

All these mentioned reasons can impact on the efficiency and effectiveness of subjective and objective classification. Accordingly, two bootstrapping methods were designed to learning linguistic patterns from unannotated text data. Both methods are starting with a handful of seed words and unannotated textual data.

−

~~| book-title = Proceedings of the Association for Computational Linguistics (ACL)~~

+

所有这些原因都会影响主客观分类的效率和有效性。相应地，设计了两种自举方法来从未注释的文本数据中学习语言模式。两种方法都以少量种子词和未注释的文本数据开始。

−

Basilisk (Bootstrapping Approach to SemantIc Lexicon Induction using Semantic Knowledge) by Thelen and Riloff. Step One: Generate extration patterns Step Two: Move best patterns from Pattern Pool to Candidate Word Pool. Step Three: Top 10 words will be marked and add to the dictionary. Repeat.

+

# Meta-Bootstrapping by Riloff and Jones in 1999.<ref>{{Cite journal|last1=Riloff|first1=Ellen|last2=Jones|first2=Rosie|date=July 1999|title=Learning dictionaries for information extraction by multi-level bootstrapping|url=https://aaai.org/Papers/AAAI/1999/AAAI99-068.pdf|journal=AAAI '99/IAAI '99: Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence|pages=474–479}}</ref> Level One: Generate extraction patterns based on the pre-defined rules and the extracted patterns by the number of seed words each pattern holds. Leve Two: Top 5 words will be marked and add to the dictionary. Repeat.

+

# Basilisk (Bootstrapping Approach to SemantIc Lexicon Induction using Semantic Knowledge) by Thelen and Riloff.<ref>{{Cite journal|last1=Thelen|first1=Michael|last2=Riloff|first2=Ellen|date=2002-07-06|title=A bootstrapping method for learning semantic lexicons using extraction pattern contexts|journal=Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10|series=EMNLP '02|volume=10|location=USA|publisher=Association for Computational Linguistics|pages=214–221|doi=10.3115/1118693.1118721|s2cid=137155|doi-access=free}}</ref> Step One: Generate extration patterns Step Two: Move best patterns from Pattern Pool to Candidate Word Pool. Step Three: Top 10 words will be marked and add to the dictionary. Repeat.

−

Basilisk (~~ b ootstrapping a pproach~~ to ~~ s emantIc l exicon i duction~~ using ~~ s emantIc k nowledge~~).~~第一步~~: ~~生成抽取模式第二步~~: ~~将最好的模式从模式池移动到候选单词池。第三步~~: ~~将前10个单词标记并添加到字典中。重复。~~

+

# Meta-Bootstrapping by Riloff and Jones in 1999. Level One: Generate extraction patterns based on the pre-defined rules and the extracted patterns by the number of seed words each pattern holds. Leve Two: Top 5 words will be marked and add to the dictionary. Repeat.

+

# Basilisk (Bootstrapping Approach to SemantIc Lexicon Induction using Semantic Knowledge) by Thelen and Riloff. Step One: Generate extration patterns Step Two: Move best patterns from Pattern Pool to Candidate Word Pool. Step Three: Top 10 words will be marked and add to the dictionary. Repeat.

−

~~| year = 2005~~

+

# 1999年里洛夫和琼斯的 Meta-Bootstrapping。第一级: 根据预定义的规则生成提取模式，并根据每个模式所包含的种子词数量生成提取模式。第二步: 前5个单词将被标记并添加到字典中。重复。# Basilisk (Bootstrapping Approach to SemantIc Lexicon inducing using SemantIc Knowledge) Thelen and Riloff.第一步: 生成抽取模式第二步: 将最好的模式从模式池移动到候选单词池。第三步: 将前10个单词标记并添加到字典中。重复。

−

~~| pages = 115–124~~

+

Overall, these algorithms highlight the need for automatic pattern recognition and extraction in subjective and objective task.

第239行：第386行：

总之，这些算法突出了主客观任务中模式自动识别和提取的需要。

−

| url = ~~http~~://www.~~cs.cornell~~.~~edu~~/~~home~~/~~llee~~/~~papers~~/~~pang~~-~~lee-stars.home.html~~

+

Subjective and object classifier can enhance the serval applications of natural language processing. One of the classifier's primary benefits is that it popularized the practice of data-driven decision-making processes in various industries.  According to Liu, the applications of subjective and objective identification have been implemented in business, advertising, sports, and social science.<ref>{{Cite journal|last=Liu|first=Bing|date=2012-05-23|title=Sentiment Analysis and Opinion Mining|url=https://www.morganclaypool.com/doi/abs/10.2200/S00416ED1V01Y201204HLT016|journal=Synthesis Lectures on Human Language Technologies|volume=5|issue=1|pages=1–167|doi=10.2200/S00416ED1V01Y201204HLT016|issn=1947-4040}}</ref>

−

}}

+

Subjective and object classifier can enhance the serval applications of natural language processing. One of the classifier's primary benefits is that it popularized the practice of data-driven decision-making processes in various industries. According to Liu, the applications of subjective and objective identification have been implemented in business, advertising, sports, and social science.

−

Subjective and object classifier can enhance the serval applications of natural language processing. One of the classifier's primary benefits is that it popularized the practice of data-driven decision-making processes in various industries.  According to Liu, the applications of subjective and objective identification have been implemented in business, advertising, sports, and social science.

+

主观分类器和对象分类器可以增强自然语言处理的一些应用。分类器的主要好处之一是它使数据驱动决策过程的实践在各个行业中普及。据刘说，主观和客观识别的应用已经在商业、广告、体育和社会科学中得到了实施。

−

主观分类器和对象分类器可以增强自然语言处理的一些应用。分类器的主要好处之一是它使数据驱动决策过程的实践在各个行业中普及。据刘说，主观和客观识别的应用已经在商业、广告、体育和社会科学中得到了实施。

+

* Online review classification: In the business industry, the classifier helps the company better understand the feedbacks on product and reasonings behind the reviews.

+

* Stock price prediction: In the finance industry, the classier aids the prediction model by process auxiliary information from social media and other textual information from the Internet. Previous studies on Japanese stock price conducted by Dong et.al. indicates that model with subjective and objective module may perform better than those without this part.<ref>{{Cite journal|last1=Deng|first1=Shangkun|last2=Mitsubuchi|first2=Takashi|last3=Shioda|first3=Kei|last4=Shimada|first4=Tatsuro|last5=Sakurai|first5=Akito|date=December 2011|title=Combining Technical Analysis with Sentiment Analysis for Stock Price Prediction|url=http://dx.doi.org/10.1109/dasc.2011.138|journal=2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing|pages=800–807|publisher=IEEE|doi=10.1109/dasc.2011.138|isbn=978-1-4673-0006-3|s2cid=15262023}}</ref>

+

* Social media analysis.

+

* Students' feedback classification.<ref>{{Cite journal|last1=Nguyen|first1=Kiet Van|last2=Nguyen|first2=Vu Duc|last3=Nguyen|first3=Phu X.V.|last4=Truong|first4=Tham T.H.|last5=Nguyen|first5=Ngan L-T.|date=2018-10-01|title=UIT-VSFC: Vietnamese Students' Feedback Corpus for Sentiment Analysis|url=https://ieeexplore.ieee.org/document/8573337|journal=2018 10th International Conference on Knowledge and Systems Engineering (KSE)|pages=19–24|location=Vietnam|publisher=IEEE|doi=10.1109/KSE.2018.8573337|isbn=978-1-5386-6113-0}}</ref>

+

*Document summarising: The classifier can extract target-specified comments and gathering opinions made by one particular entity.

+

* Complex question answering. The classifier can dissect the complex questions by classing the language subject or objective and focused target. In the research Yu et al.(2003), the researcher developed a sentence and document level clustered that identity opinion pieces.<ref>{{Cite journal|last1=Yu|first1=Hong|last2=Hatzivassiloglou|first2=Vasileios|date=2003-07-11|title=Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences|journal=Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing|series=EMNLP '03|location=USA|publisher=Association for Computational Linguistics|pages=129–136|doi=10.3115/1119355.1119372|doi-access=free}}</ref>

+

* Domain-specific applications.

+

* Email analysis: The subjective and objective classifier detects spam by tracing language patterns with target words.

−

~~</ref>~~ and ~~Snyder<ref name = "SnyderBarzilay07">~~

+

* Online review classification: In the business industry, the classifier helps the company better understand the feedbacks on product and reasonings behind the reviews.

+

* Stock price prediction: In the finance industry, the classier aids the prediction model by process auxiliary information from social media and other textual information from the Internet. Previous studies on Japanese stock price conducted by Dong et.al. indicates that model with subjective and objective module may perform better than those without this part.

+

* Social media analysis.

+

* Students' feedback classification.

+

*Document summarising: The classifier can extract target-specified comments and gathering opinions made by one particular entity.

+

* Complex question answering. The classifier can dissect the complex questions by classing the language subject or objective and focused target. In the research Yu et al.(2003), the researcher developed a sentence and document level clustered that identity opinion pieces.

+

* Domain-specific applications.

+

* Email analysis: The subjective and objective classifier detects spam by tracing language patterns with target words.

−

~~{{cite conference~~

−

~~| first1 = Benjamin | last1 = Snyder~~

+

* 在线评论分类: 在商业行业，分类器帮助公司更好地了解产品的反馈和评论背后的原因。

+

* 股票价格预测: 在金融行业，分类器通过从社会媒体获得的过程辅助信息和从互联网获得的其他文本信息来辅助预测模型。以往对日本股票价格的研究都是由 Dong et.al 进行的。表明具有主客观模块的模型可能比没有主客观模块的模型表现更好。

+

* 社交媒体分析。

+

* 学生意见分类。

+

* 文件总结: 分类器可以提取目标特定的评论，并收集一个特定实体的意见。

+

* 复杂问题回答。量词可以通过对语言主题或客观目标进行分类来解析复杂问题。在余等人的研究中。(2003) ，研究人员开发了一个句子和文档级别的群集身份意见片。

+

* 特定领域的应用程序。

+

* 电子邮件分析: 主观和客观分类器通过追踪目标单词的语言模式来检测垃圾邮件。

−

| first2 = ~~Regina~~ | last2 = ~~Barzilay~~

+

=== Feature/aspect-based ===

+

It refers to determining the opinions or sentiments expressed on different features or aspects of entities, e.g., of a cell phone, a digital camera, or a bank.<ref name="HuLiu04">{{cite conference

+

| first1 = Minqing | last1 = Hu

+

| first2 = Bing | last2 = Liu

+

| title = Mining and Summarizing Customer Reviews

+

| book-title = Proceedings of KDD 2004.

+

| year = 2004

+

| url = http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html

+

}}

+

</ref> A feature or aspect is an attribute or component of an entity, e.g., the screen of a cell phone, the service for a restaurant, or the picture quality of a camera. The advantage of feature-based sentiment analysis is the possibility to capture nuances about objects of interest. Different features can generate different sentiment responses, for example a hotel can have a convenient location, but mediocre food.<ref>{{Cite journal|title = Good location, terrible food: detecting feature sentiment in user-generated reviews|journal = Social Network Analysis and Mining|date = 2013-06-22|issn = 1869-5450|pages = 1149–1163|volume = 3|issue = 4|doi = 10.1007/s13278-013-0119-7|first1 = Mario|last1 = Cataldi|first2 = Andrea|last2 = Ballatore|first3 = Ilaria|last3 = Tiddi|first4 = Marie-Aude|last4 = Aufaure|citeseerx = 10.1.1.396.9313|s2cid = 5025282}}</ref> This problem involves several sub-problems, e.g., identifying relevant entities, extracting their features/aspects, and determining whether an opinion expressed on each feature/aspect is positive, negative or neutral.<ref name="LiuHuCheng04">{{cite conference

+

| first1 = Bing | last1 = Liu

+

| first2 = Minqing | last2 = Hu | first3 = Junsheng | last3 = Cheng

+

| title = Opinion Observer: Analyzing and Comparing Opinions on the Web

+

| book-title = Proceedings of WWW 2005.

+

| year = 2005

+

| url = http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html

+

}}

+

</ref> The automatic identification of features can be performed with syntactic methods, with [[topic model]]ing,<ref>{{Cite book|title = Constrained LDA for Grouping Product Features in Opinion Mining|publisher = Springer Berlin Heidelberg|date = 2011-01-01|isbn = 978-3-642-20840-9|pages = 448–459|series = Lecture Notes in Computer Science|doi = 10.1007/978-3-642-20841-6_37|first1 = Zhongwu|last1 = Zhai|first2 = Bing|last2 = Liu|first3 = Hua|last3 = Xu|first4 = Peifa|last4 = Jia|editor-first = Joshua Zhexue|editor-last = Huang|editor-first2 = Longbing|editor-last2 = Cao|editor-first3 = Jaideep|editor-last3 = Srivastava|citeseerx = 10.1.1.221.5178}}</ref><ref>{{Cite book|title = Modeling Online Reviews with Multi-grain Topic Models|publisher = ACM|journal = Proceedings of the 17th International Conference on World Wide Web|date = 2008-01-01|location = New York, NY, USA|isbn = 978-1-60558-085-2|pages = 111–120|series = WWW '08|doi = 10.1145/1367497.1367513|first1 = Ivan|last1 = Titov|first2 = Ryan|last2 = McDonald|arxiv = 0801.1063|s2cid = 13609860}}</ref> or with [[deep learning]].<ref name="Poria">{{cite journal

+

| first = Soujanya | last = Poria | display-authors=etal

+

| title = Aspect extraction for opinion mining with a deep convolutional neural network

+

| year = 2016

+

| journal = Knowledge-Based Systems

+

| volume= 108

+

| pages= 42–49

+

| doi=10.1016/j.knosys.2016.06.009

+

}}

+

</ref><ref name="Ma">{{cite conference

+

| first = Yukun | last = Ma | display-authors=etal

+

| title = Targeted aspect-based sentiment analysis via embedding commonsense knowledge into an attentive LSTM

+

| year = 2018

+

| book-title = Proceedings of AAAI

+

| pages= 5876–5883

+

}}

+

</ref> More detailed discussions about this level of sentiment analysis can be found in Liu's work.<ref name="Liu2010">{{cite conference

+

| first = Bing | last = Liu

+

| title = Sentiment Analysis and Subjectivity

+

+

| date = 2010

+

| url = http://www.cs.uic.edu/~liub/FBS/NLP-handbook-sentiment-analysis.pdf

+

}}

+

</ref>

−

~~| title = Multiple Aspect Ranking using~~ the ~~Good Grief Algorithm~~

+

It refers to determining the opinions or sentiments expressed on different features or aspects of entities, e.g., of a cell phone, a digital camera, or a bank.

+

A feature or aspect is an attribute or component of an entity, e.g., the screen of a cell phone, the service for a restaurant, or the picture quality of a camera. The advantage of feature-based sentiment analysis is the possibility to capture nuances about objects of interest. Different features can generate different sentiment responses, for example a hotel can have a convenient location, but mediocre food. This problem involves several sub-problems, e.g., identifying relevant entities, extracting their features/aspects, and determining whether an opinion expressed on each feature/aspect is positive, negative or neutral.

+

The automatic identification of features can be performed with syntactic methods, with topic modeling, or with deep learning.

−

~~| book-title = Proceedings~~ of ~~the Joint Human Language Technology/North American Chapter of the ACL Conference (HLT-NAACL)~~

+

More detailed discussions about this level of sentiment analysis can be found in Liu's work.

−

~~| year = 2007~~

−

~~| pages = 300–307~~

+

它指的是确定表达在实体的不同特征或方面的意见或情绪，例如手机、数码相机或银行。功能或方面是一个实体的属性或组成部分，例如，手机的屏幕，餐厅的服务，或照相机的图像质量。基于特征的情感分析的优势在于可以捕捉感兴趣对象的细微差别。不同的特征可以产生不同的情绪反应，例如，酒店可以有一个方便的地点，但平庸的食物。这个问题涉及几个子问题，例如，识别相关实体，提取它们的特征/方面，以及确定对每个特征/方面表达的意见是积极的、消极的还是中性的。特征的自动识别可以通过句法方法、主题建模或者深度学习来实现。关于这一层次的情感分析的更详细的讨论可以在刘的作品中找到。

−

~~| url~~ = ~~http://people.csail.mit.edu/regina/my_papers/ggranker.ps~~

+

== Methods and features==

−

}}

+

== Methods and features==

−

~~</ref> among others: Pang and Lee<ref name~~ = ~~"PangLee05" /> expanded the basic task of classifying a movie review as either positive or negative to predict star ratings on either a 3- or a 4-star scale, while Snyder<ref name~~ = ~~"SnyderBarzilay07" /> performed an in-depth analysis of restaurant reviews, predicting ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-star scale).~~

+

= = 方法和特征 = =

+

Existing approaches to sentiment analysis can be grouped into three main categories: knowledge-based techniques, statistical methods, and hybrid approaches.<ref name ="“Cambria">

+

{{cite journal

+

| first1 = E

+

| last1 = Cambria

+

| first2 = B

+

| last2 = Schuller

+

| first3 = Y

+

| last3 = Xia

+

| first4 = C

+

| last4 = Havasi

+

| title = New avenues in opinion mining and sentiment analysis

+

| year = 2013

+

| journal = IEEE Intelligent Systems

+

| volume= 28

+

| issue= 2

+

| pages= 15–21

+

| doi=10.1109/MIS.2013.30

+

| citeseerx = 10.1.1.688.1384

+

| s2cid = 12104996

+

}}

+

</ref> Knowledge-based techniques classify text by affect categories based on the presence of unambiguous affect words such as happy, sad, afraid, and bored.<ref name="Ortony">

+

{{cite book

+

|first1 = Andrew

+

|last1 = Ortony

+

|first2 = G

+

|last2 = Clore

+

|first3 = A

+

|last3 = Collins

+

|title = The Cognitive Structure of Emotions

+

|year = 1988

+

|url = http://www.cogsci.northwestern.edu/courses/cg207/readings/Cognitive_Structure_of_Emotions_exerpt.pdf

+

|publisher = Cambridge Univ. Press

+

|url-status = dead

+

|archive-url = https://web.archive.org/web/20151123055038/http://www.cogsci.northwestern.edu/courses/cg207/readings/Cognitive_Structure_of_Emotions_exerpt.pdf

+

|archive-date = 2015-11-23

+

}}

+

</ref> Some knowledge bases not only list obvious affect words, but also assign arbitrary words a probable "affinity" to particular emotions.<ref name ="Stevenson">

+

{{cite journal

+

| first1 = Ryan

+

| last1 = Stevenson

+

| first2 = Joseph

+

| last2 = Mikels

+

| first3 = Thomas

+

| last3 = James

+

| title = Characterization of the Affective Norms for English Words by Discrete Emotional Categories

+

| year = 2007

+

| journal = Behavior Research Methods

+

| volume= 39

+

| issue= 4

+

| pages= 1020–1024

+

| url = http://indiana.edu/~panlab/papers/SraMjaJtw_ANEW.pdf

+

| pmid = 18183921

+

| doi=10.3758/bf03192999

+

| s2cid = 6673690

+

| doi-access = free

+

}}

+

</ref> Statistical methods leverage elements from [[machine learning]] such as [[latent semantic analysis]], [[support vector machines]], "[[bag of words]]", "[[Pointwise Mutual Information]]" for Semantic Orientation,<ref name = "Turney02">

+

{{cite conference

+

| first = Peter | last = Turney

+

| title = Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews

+

| book-title = Proceedings of the Association for Computational Linguistics

+

| pages = 417–424

+

| year = 2002

+

| arxiv = cs.LG/0212032

+

}}

+

</ref> and [[deep learning]]. More sophisticated methods try to detect the holder of a sentiment (i.e., the person who maintains that affective state) and the target (i.e., the entity about which the affect is felt).<ref name="Kim+Hovy06">

+

{{cite conference

+

|last1 = Kim

+

|first1 = S. M.

+

|last2 = Hovy

+

|first2 = E. H.

+

|title = Identifying and Analyzing Judgment Opinions.

+

|book-title = Proceedings of the Human Language Technology / North American Association of Computational Linguistics conference (HLT-NAACL 2006). New York, NY.

+

|year = 2006

+

|url = http://acl.ldc.upenn.edu/P/P06/P06-2063.pdf

+

|archive-url = https://web.archive.org/web/20110629121135/http://acl.ldc.upenn.edu/P/P06/P06-2063.pdf

+

|url-status = dead

+

|archive-date = 2011-06-29

+

}}

+

</ref> To mine the opinion in [[Context (language use)|context]] and get the feature about which the speaker has opined, the grammatical relationships of words are used. Grammatical dependency relations are obtained by deep parsing of the text.<ref name="DeyHaque08">

+

{{cite conference

+

| first1 = Lipika | last1 = Dey | first2 = S. K. Mirajul | last2 = Haque

+

| title = Opinion Mining from Noisy Text Data

+

| book-title = Proceedings of the second workshop on Analytics for noisy unstructured text data, p.83-90

+

| year = 2008

+

| url = http://portal.acm.org/citation.cfm?id=1390763&dl=GUIDE&coll=GUIDE&CFID=92244761&CFTOKEN=30578437

+

}}

+

</ref> Hybrid approaches leverage both machine learning and elements from [[knowledge representation]] such as [[ontologies]] and [[semantic network]]s in order to detect semantics that are expressed in a subtle manner, e.g., through the analysis of concepts that do not explicitly convey relevant information, but which are implicitly linked to other concepts that do so.<ref name ="“Hussain">

+

{{cite book

+

| first1 = E

+

| last1 = Cambria

+

| first2 = A

+

| last2 = Hussain

+

| title = Sentic Computing: A Common-Sense-Based Framework for Concept-Level Sentiment Analysis

+

| year = 2015

+

| url = http://springer.com/9783319236544

+

| publisher = Springer

+

| isbn = 9783319236544

+

}}

+

</ref>

+

Existing approaches to sentiment analysis can be grouped into three main categories: knowledge-based techniques, statistical methods, and hybrid approaches.

−

It refers to determining the opinions or sentiments expressed on different features or aspects of entities, e.g., of a cell phone, a digital camera, or a bank. A feature or aspect is an attribute or component of an entity, e.g., the screen of a cell phone, the service for a restaurant, or the picture quality of a camera. The advantage of feature-based ~~sentiment analysis is~~ the ~~possibility to capture nuances about objects~~ of ~~interest. Different features can generate different sentiment responses, for example a hotel can have a convenient location, but mediocre food. This problem involves several sub-problems~~, ~~e.g.~~, ~~identifying relevant entities, extracting their features/aspects~~, and determining whether an opinion expressed on each feature/aspect is positive, negative or neutral. The automatic identification of features can be performed with syntactic methods, with topic modeling, or with deep learning. More detailed discussions about this level of sentiment analysis can be found in Liu's work.

+

Knowledge-based techniques classify text by affect categories based on the presence of unambiguous affect words such as happy, sad, afraid, and bored.

−

它指的是确定对实体的不同特征或方面表达的意见或感情，例如，手机、数码相机或银行。功能或方面是一个实体的属性或组成部分，例如，手机的屏幕，餐厅的服务，或照相机的图像质量。基于特征的情感分析的优势在于可以捕捉感兴趣对象的细微差别。不同的特征可以产生不同的情绪反应，例如，酒店可以有一个方便的地点，但平庸的食物。这个问题涉及几个子问题，例如，识别相关实体，提取它们的特征/方面，以及确定对每个特征/方面表达的意见是积极的、消极的还是中性的。特征的自动识别可以通过句法方法、主题建模或者深度学习来实现。关于这一层次的情感分析的更详细的讨论可以在刘的作品中找到。

+

Some knowledge bases not only list obvious affect words, but also assign arbitrary words a probable "affinity" to particular emotions.

−

~~First steps to bringing together various approaches—learning~~, ~~lexical~~, ~~knowledge-based, etc.—were taken in the 2004 [[AAAI]] Spring Symposium where linguists~~, computer scientists, and other interested researchers first aligned interests and proposed shared tasks and benchmark data sets for the systematic computational research on affect, appeal, subjectivity, and sentiment in text.<ref>Qu, Yan, James Shanahan, and Janyce Wiebe. "~~Exploring attitude and affect in text: Theories and applications.~~" ~~In AAAI Spring Symposium) Technical report SS-04-07. AAAI Press, Menlo Park~~, ~~CA. 2004.</ref>~~

+

Statistical methods leverage elements from machine learning such as latent semantic analysis, support vector machines, "bag of words", "Pointwise Mutual Information" for Semantic Orientation,

+

and deep learning. More sophisticated methods try to detect the holder of a sentiment (i.e., the person who maintains that affective state) and the target (i.e., the entity about which the affect is felt).

+

To mine the opinion in context and get the feature about which the speaker has opined, the grammatical relationships of words are used. Grammatical dependency relations are obtained by deep parsing of the text.

−

~~Even though~~ in ~~most statistical classification methods~~, the ~~neutral class is ignored under the assumption~~ that ~~neutral texts lie near the boundary of the binary classifier~~, ~~several researchers suggest~~ that~~, as in every polarity problem, three categories must be identified~~. ~~Moreover, it can be proven that specific classifiers such as the [[Maximum entropy probability distribution|Max Entropy]]<ref name = "Vryniotis13">~~

+

Hybrid approaches leverage both machine learning and elements from knowledge representation such as ontologies and semantic networks in order to detect semantics that are expressed in a subtle manner, e.g., through the analysis of concepts that do not explicitly convey relevant information, but which are implicitly linked to other concepts that do so.

−

~~{{cite conference~~

−

Existing approaches to sentiment analysis can be grouped into three main categories: knowledge-based techniques, statistical methods, and hybrid approaches. Knowledge-based techniques classify text by affect categories based on the presence of unambiguous affect words such as happy, sad, afraid, and bored. Some knowledge bases not only list obvious affect words, but also assign arbitrary words a probable "affinity" to particular emotions. Statistical methods leverage elements from machine learning such as latent semantic analysis, support vector machines, "bag of words", "Pointwise Mutual Information" for Semantic Orientation, and deep learning. More sophisticated methods try to detect the holder of a sentiment (i.e., the person who maintains that affective state) and the target (i.e., the entity about which the affect is felt). To mine the opinion in context and get the feature about which the speaker has opined, the grammatical relationships of words are used. Grammatical dependency relations are obtained by deep parsing of the text. Hybrid approaches leverage both machine learning and elements from knowledge representation such as ontologies and semantic networks in order to detect semantics that are expressed in a subtle manner, e.g., through the analysis of concepts that do not explicitly convey relevant information, but which are implicitly linked to other concepts that do so.

现有的情感分析方法可以分为三大类: 基于知识的技术、统计方法和混合方法。基于知识的技术根据明确的情感词汇的出现，如高兴、悲伤、害怕和无聊，按照情感类别对文本进行分类。一些知识库不仅列出了明显的情感词汇，而且还赋予任意的词汇一种可能的特定情感的“亲和力”。统计方法利用机器学习中的元素，例如潜在语义学、支持向量机、“单词包”、语义定位的“点间互信息”和深度学习。更复杂的方法试图检测情绪的持有者(即保持情绪状态的人)和目标(即感受情绪的实体)。为了在上下文中挖掘观点，得到说话人的观点，使用了词语的语法关系。语法依存关系是通过对文本的深入分析得到的。混合方法利用机器学习和来自知识表示的元素，如本体论和语义网络，以便检测以微妙的方式表示的语义，例如，通过分析没有明确传达相关信息，但是隐含链接到这样做的其他概念的概念。

−

| ~~first~~ = ~~Vasilis~~ | ~~last~~ = ~~Vryniotis~~

+

Open source software tools as well as range of free and paid sentiment analysis tools deploy [[machine learning]], statistics, and natural language processing techniques to automate sentiment analysis on large collections of texts, including web pages, online news, internet discussion groups, online reviews, web blogs, and social media.<ref name="AkcoraBayirDemirbasFerhatosmanoglu2010">

+

{{cite conference

+

+

| title = Identifying breakpoints in public opinion

+

| book-title = SigKDD, Proceedings of the First Workshop on Social Media Analytics

+

| year = 2010

+

| url = http://portal.acm.org/citation.cfm?id=1964867

+

}}

+

</ref> Knowledge-based systems, on the other hand, make use of publicly available resources, to extract the semantic and affective information associated with natural language concepts. The system can help perform affective [[commonsense reasoning]].<ref>{{Cite journal|last1=Sasikala|first1=P.|last2=Mary Immaculate Sheela|first2=L.|date=December 2020|title=Sentiment analysis of online product reviews using DLMNN and future prediction of online product using IANFIS|journal=Journal of Big Data|language=en|volume=7|issue=1|pages=33|doi=10.1186/s40537-020-00308-7|issn=2196-1115|doi-access=free}}</ref> Sentiment analysis can also be performed on visual content, i.e., images and videos (see [[Multimodal sentiment analysis]]). One of the first approaches in this direction is SentiBank<ref name = "Borth13">

+

{{cite conference

+

| first1 = Damian | last1 = Borth

+

+

| title = Large-scale Visual Sentiment Ontology and Detectors Using Adjective Noun Pairs

+

| book-title = Proceedings of ACM Int. Conference on Multimedia

+

| pages = 223–232

+

| year = 2013

+

| url = https://visual-sentiment-ontology.appspot.com

+

}}

+

</ref> utilizing an adjective noun pair representation of visual content. In addition, the vast majority of sentiment classification approaches rely on the bag-of-words model, which disregards context, [[grammar]] and even [[word order]]. Approaches that analyses the sentiment based on how words compose the meaning of longer phrases have shown better result,<ref>{{Cite journal|last1=Socher|first1=Richard|last2=Perelygin|first2=Alex|last3=Wu|first3=Jean Y.|last4=Chuang|first4=Jason|last5=Manning|first5=Christopher D.|last6=Ng|first6=Andrew Y.|last7=Potts|first7=Christopher|date=2013|title=Recursive deep models for semantic compositionality over a sentiment treebank|journal=In Proceedings of EMNLP|pages=1631–1642|citeseerx=10.1.1.593.7427}}</ref> but they incur an additional annotation overhead.

−

~~| title = The importance~~ of ~~Neutral Class in Sentiment Analysis~~

+

Open source software tools as well as range of free and paid sentiment analysis tools deploy machine learning, statistics, and natural language processing techniques to automate sentiment analysis on large collections of texts, including web pages, online news, internet discussion groups, online reviews, web blogs, and social media.

−

Open source software tools as well as range of free and paid sentiment analysis tools deploy machine learning, statistics, and natural language processing techniques to automate sentiment analysis on large collections of texts, including web pages, online news, internet discussion groups, online reviews, web blogs, and social media. Knowledge-based systems, on the other hand, make use of publicly available resources, to extract the semantic and affective information associated with natural language concepts. The system can help perform affective commonsense reasoning. Sentiment analysis can also be performed on visual content, i.e., images and videos (see Multimodal sentiment analysis). One of the first approaches in this direction is SentiBank utilizing an adjective noun pair representation of visual content. In addition, the vast majority of sentiment classification approaches rely on the bag-of-words model, which disregards context, grammar and even word order. Approaches that analyses the sentiment based on how words compose the meaning of longer phrases have shown better result, but they incur an additional annotation overhead.

+

Knowledge-based systems, on the other hand, make use of publicly available resources, to extract the semantic and affective information associated with natural language concepts. The system can help perform affective commonsense reasoning. Sentiment analysis can also be performed on visual content, i.e., images and videos (see Multimodal sentiment analysis). One of the first approaches in this direction is SentiBank

+

utilizing an adjective noun pair representation of visual content. In addition, the vast majority of sentiment classification approaches rely on the bag-of-words model, which disregards context, grammar and even word order. Approaches that analyses the sentiment based on how words compose the meaning of longer phrases have shown better result, but they incur an additional annotation overhead.

开源软件工具以及一系列免费和付费的情绪分析工具利用机器学习、统计学和自然语言处理技术，对大量文本自动进行情绪分析，这些文本包括网页、在线新闻、互联网讨论组、在线评论、网络博客和社交媒体。另一方面，知识推理系统则利用公开的资源，提取与自然语言概念相关的语义和情感信息。该系统可以帮助执行情感常识推理。情感分析也可以在可视内容上执行，例如，图像和视频(请参阅 Multimodal 情感分析)。这方面的第一个方法是使用形容词名词对表示视觉内容。此外，绝大多数情感分类方法都依赖于情感分类词袋模型，它忽略了上下文、语法甚至词序。基于词语组成长短语意义的情感分析方法取得了较好的效果，但也增加了额外的注释开销。

−

| ~~year~~ = ~~2013~~

+

A human analysis component is required in sentiment analysis, as automated systems are not able to analyze historical tendencies of the individual commenter, or the platform and are often classified incorrectly in their expressed sentiment. Automation impacts approximately 23% of comments that are correctly classified by humans.<ref>{{cite web|title=Case Study: Advanced Sentiment Analysis|url=http://paragonpoll.com/sentiment-analysis-systems-case-study/|access-date=18 October 2013}}</ref> However, humans often disagree, and it is argued that the inter-human agreement provides an upper bound that automated sentiment classifiers can eventually reach.<ref>{{Cite journal|last1=Mozetič|first1=Igor|last2=Grčar|first2=Miha|last3=Smailović|first3=Jasmina|date=2016-05-05|title=Multilingual Twitter Sentiment Classification: The Role of Human Annotators|journal=PLOS ONE|volume=11|issue=5|pages=e0155036|doi=10.1371/journal.pone.0155036|issn=1932-6203|pmc=4858191|pmid=27149621|arxiv=1602.07563|bibcode=2016PLoSO..1155036M}}</ref>

−

| url = http://~~blog.datumbox~~.com/~~the~~-~~importance~~-of-~~neutral~~-~~class~~-in-sentiment-~~analysis~~/

A human analysis component is required in sentiment analysis, as automated systems are not able to analyze historical tendencies of the individual commenter, or the platform and are often classified incorrectly in their expressed sentiment. Automation impacts approximately 23% of comments that are correctly classified by humans. However, humans often disagree, and it is argued that the inter-human agreement provides an upper bound that automated sentiment classifiers can eventually reach.

第303行：第630行：

在情感分析中需要一个人工分析组件，因为自动化系统不能分析个人评论者或平台的历史趋势，而且在他们表达的情感中常常被错误地分类。自动化影响了大约23% 被人类正确分类的评论。然而，人们往往不同意，并认为人际协议提供了一个上限，自动情绪分类器最终可以达到。

−

}}

+

== Evaluation ==

+

== Evaluation ==

+

= = 评估 = =

+

The accuracy of a sentiment analysis system is, in principle, how well it agrees with human judgments. This is usually measured by variant measures based on [[precision and recall]] over the two target categories of negative and positive texts. However, according to research human raters typically only agree about 80%<ref>

+

{{cite news

+

| last = Ogneva | first = M.

+

| title = How Companies Can Use Sentiment Analysis to Improve Their Business

+

| url=http://mashable.com/2010/04/19/sentiment-analysis/ | publisher = Mashable

+

|access-date=2012-12-13}}

+

</ref> of the time (see [[Inter-rater reliability]]). Thus, a program that achieves 70% accuracy in classifying sentiment is doing nearly as well as humans, even though such accuracy may not sound impressive. If a program were "right" 100% of the time, humans would still disagree with it about 20% of the time, since they disagree that much about ''any'' answer.<ref>

+

{{cite book

+

| last = Roebuck | first = K.

+

| title = Sentiment Analysis: High-impact Strategies - What You Need to Know: Definitions, Adoptions, Impact, Benefits, Maturity, Vendors

+

| url=https://books.google.com/books?id=kqsNBwAAQBAJ| isbn = 9781743049457

+

| date = 2012-10-24

+

}}

+

</ref>

−

~~</ref>~~ and ~~[[Support vector machine|SVMs]]<ref name = "KoppelSchler06">~~

+

The accuracy of a sentiment analysis system is, in principle, how well it agrees with human judgments. This is usually measured by variant measures based on precision and recall over the two target categories of negative and positive texts. However, according to research human raters typically only agree about 80%

−

~~{{cite conference~~

+

of the time (see Inter-rater reliability). Thus, a program that achieves 70% accuracy in classifying sentiment is doing nearly as well as humans, even though such accuracy may not sound impressive. If a program were "right" 100% of the time, humans would still disagree with it about 20% of the time, since they disagree that much about any answer.

−

~~| first1 = Moshe | last1 = Koppel~~

−

The accuracy of a sentiment analysis system is, in principle, how well it agrees with human judgments. This is usually measured by variant measures based on precision and recall over the two target categories of negative and positive texts. However, according to research human raters typically only agree about 80% of the time (see Inter-rater reliability). Thus, a program that achieves 70% accuracy in classifying sentiment is doing nearly as well as humans, even though such accuracy may not sound impressive. If a program were "right" 100% of the time, humans would still disagree with it about 20% of the time, since they disagree that much about any answer.

情感分析系统的准确性，原则上来说，就是它与人类判断的一致程度。这通常是衡量的不同措施的基础上的准确率召回率，超过两个目标类别的消极和积极的文本。然而，根据研究，人类评分员通常只有80% 的时间同意(参见评分员之间的可靠性)。因此，一个在分类情绪上达到70% 准确率的程序几乎和人类一样好，即使这样的准确率听起来并不令人印象深刻。如果一个程序在100% 的时间里是“正确的”，人类仍然会在20% 的时间里不同意它，因为他们对任何答案都有很大的不同意见。

−

| ~~first2 = Jonathan~~ | ~~last2 = Schler~~

+

On the other hand, computer systems will make very different errors than human assessors, and thus the figures are not entirely comparable. For instance, a computer system will have trouble with negations, exaggerations, [[joke]]s, or sarcasm, which typically are easy to handle for a human reader: some errors a computer system makes will seem overly naive to a human. In general, the utility for practical commercial tasks of sentiment analysis as it is defined in academic research has been called into question, mostly since the simple one-dimensional model of sentiment from negative to positive yields rather little actionable information for a client worrying about the effect of public discourse on e.g. brand or corporate reputation.<ref>

−

+

[[Jussi Karlgren|Karlgren, Jussi]], [[Magnus Sahlgren]], Fredrik Olsson, Fredrik Espinoza, and Ola Hamfors. "Usefulness of sentiment analysis." In European Conference on Information Retrieval, pp. 426-435. Springer Berlin Heidelberg, 2012.

−

| ~~title = The Importance~~ of ~~Neutral Examples for Learning Sentiment~~

+

</ref><ref>

+

[[Jussi Karlgren|Karlgren, Jussi]]. "The relation between author mood and affect to sentiment in text and text genre." In Proceedings of the fourth workshop on Exploiting semantic annotations in information retrieval, pp. 9-10. ACM, 2011.

+

</ref><ref>

+

[[Jussi Karlgren|Karlgren, Jussi]]. "[http://www.diva-portal.org/smash/get/diva2:1042636/FULLTEXT01.pdf Affect, appeal, and sentiment as factors influencing interaction with multimedia information]." In Proceedings of Theseus/ImageCLEF workshop on visual information retrieval evaluation, pp. 8-11. 2009.

+

</ref>

On the other hand, computer systems will make very different errors than human assessors, and thus the figures are not entirely comparable. For instance, a computer system will have trouble with negations, exaggerations, jokes, or sarcasm, which typically are easy to handle for a human reader: some errors a computer system makes will seem overly naive to a human. In general, the utility for practical commercial tasks of sentiment analysis as it is defined in academic research has been called into question, mostly since the simple one-dimensional model of sentiment from negative to positive yields rather little actionable information for a client worrying about the effect of public discourse on e.g. brand or corporate reputation.

+

Karlgren, Jussi, Magnus Sahlgren, Fredrik Olsson, Fredrik Espinoza, and Ola Hamfors. "Usefulness of sentiment analysis." In European Conference on Information Retrieval, pp. 426-435. Springer Berlin Heidelberg, 2012.

−

另一方面，计算机系统会犯与人类评估员非常不同的错误，因此这些数字并不完全可比。例如，计算机系统在否定、夸张、笑话或讽刺方面会遇到麻烦，而这些对于人类读者来说通常是很容易处理的: 计算机系统出现的一些错误对于人类来说会显得过于天真。一般来说，学术研究中定义的情绪分析对实际商业任务的效用受到质疑，主要是因为简单的从消极到积极的情绪单维度模型产生的可操作信息很少，客户担心公共话语对情绪分析的影响。品牌或企业声誉。

+

Karlgren, Jussi. "The relation between author mood and affect to sentiment in text and text genre." In Proceedings of the fourth workshop on Exploiting semantic annotations in information retrieval, pp. 9-10. ACM, 2011.

−

~~| book~~-~~title = Computational Intelligence 22~~

+

Karlgren, Jussi. "Affect, appeal, and sentiment as factors influencing interaction with multimedia information." In Proceedings of Theseus/ImageCLEF workshop on visual information retrieval evaluation, pp. 8-11. 2009.

−

~~| year = 2006~~

−

~~To better fit market needs~~, ~~evaluation of sentiment analysis has moved to more task-based measures~~, ~~formulated together with representatives from PR agencies~~ and ~~market research professionals~~. ~~The focus in e~~.g. ~~the RepLab evaluation data set is less on the content of the text under consideration and more on the effect of the text in question on brand reputation~~.

+

另一方面，计算机系统会犯与人类评估员非常不同的错误，因此这些数字并不完全可比。例如，计算机系统在否定、夸张、笑话或讽刺方面会遇到麻烦，而这些对于人类读者来说通常是很容易处理的: 计算机系统出现的一些错误对于人类来说会显得过于天真。一般来说，学术研究中定义的情绪分析对实际商业任务的效用受到质疑，主要是因为简单的从消极到积极的情绪单维度模型产生的可操作信息很少，客户担心公共话语对情绪分析的影响。品牌或企业声誉。Karlgren, Jussi, Magnus Sahlgren, Fredrik Olsson, Fredrik Espinoza, and Ola Hamfors.“情绪分析的有用性。”在欧洲信息检索会议上，pp。426-435.Springer Berlin Heidelberg，2012年。尤西 · 卡尔格伦。作者情绪与文本和文本体裁中情感的关系在《第四次研讨会论文集---- 开发信息检索语义标注》中，pp。9-10.美国计算机协会，2011。尤西 · 卡尔格伦。影响与多媒体信息互动的因素包括情感、吸引力和情感在 Theseus/ImageCLEF 视觉信息检索评估研讨会论文集中，第页。8-11.2009.

−

为了更好地适应市场需求，情绪分析的评估已转向更多基于任务的措施，与公关机构和市场研究专业人士的代表共同制定。中的焦点。RepLab 评估数据集较少考虑文本的内容，而更多考虑文本对品牌声誉的影响。

+

To better fit market needs, evaluation of sentiment analysis has moved to more task-based measures, formulated together with representatives from PR agencies and market research professionals. The focus in e.g. the RepLab evaluation data set is less on the content of the text under consideration and more on the effect of the text in question on [[brand image|brand reputation]].<ref>

+

Amigó, Enrique, Adolfo Corujo, Julio Gonzalo, Edgar Meij, and [[Maarten de Rijke]]. "Overview of RepLab 2012: Evaluating Online Reputation Management Systems." In CLEF (Online Working Notes/Labs/Workshop). 2012.

+

</ref><ref>

+

Amigó, Enrique, Jorge Carrillo De Albornoz, Irina Chugur, Adolfo Corujo, Julio Gonzalo, Tamara Martín, Edgar Meij, [[Maarten de Rijke]], and Damiano Spina. "Overview of replab 2013: Evaluating online reputation monitoring systems." In International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 333-352. Springer Berlin Heidelberg, 2013.

+

</ref><ref name="replab2014">

+

Amigó, Enrique, Jorge Carrillo-de-Albornoz, Irina Chugur, Adolfo Corujo, Julio Gonzalo, Edgar Meij, [[Maarten de Rijke]], and Damiano Spina. "Overview of replab 2014: author profiling and reputation dimensions for online reputation management." In International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 307-322. Springer International Publishing, 2014.

+

</ref>

−

~~| pages = 100–109~~

+

To better fit market needs, evaluation of sentiment analysis has moved to more task-based measures, formulated together with representatives from PR agencies and market research professionals. The focus in e.g. the RepLab evaluation data set is less on the content of the text under consideration and more on the effect of the text in question on brand reputation.

+

Amigó, Enrique, Adolfo Corujo, Julio Gonzalo, Edgar Meij, and Maarten de Rijke. "Overview of RepLab 2012: Evaluating Online Reputation Management Systems." In CLEF (Online Working Notes/Labs/Workshop). 2012.

−

~~| citeseerx = 10~~.1.1.84.~~9735~~

+

Amigó, Enrique, Jorge Carrillo De Albornoz, Irina Chugur, Adolfo Corujo, Julio Gonzalo, Tamara Martín, Edgar Meij, Maarten de Rijke, and Damiano Spina. "Overview of replab 2013: Evaluating online reputation monitoring systems." In International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 333-352. Springer Berlin Heidelberg, 2013.

−

~~Because evaluation~~ of ~~sentiment analysis is becoming more~~ and ~~more task based, each implementation needs a separate training model to get a more accurate representation~~ of ~~sentiment~~ for ~~a given data set~~.

+

Amigó, Enrique, Jorge Carrillo-de-Albornoz, Irina Chugur, Adolfo Corujo, Julio Gonzalo, Edgar Meij, Maarten de Rijke, and Damiano Spina. "Overview of replab 2014: author profiling and reputation dimensions for online reputation management." In International Conference of the Cross-Language Evaluation Forum for European Languages, pp. 307-322. Springer International Publishing, 2014.

−

~~由于情感分析的评价越来越多地基于任务，每个实现都需要一个单独的训练模型来更准确地表达给定数据集的情感。~~

−

}}

+

为了更好地适应市场需求，情绪分析的评估已转向更多基于任务的措施，与公关机构和市场研究专业人士的代表共同制定。中的焦点。RepLab 评估数据集较少考虑文本的内容，而更多考虑文本对品牌声誉的影响。Amigó, Enrique, Adolfo Corujo, Julio Gonzalo, Edgar Meij, and Maarten de Rijke.“ RepLab 2012概述: 评估在线信誉管理系统”在 CLEF (网上工作笔记/实验室/工作坊)。2012.Amigó, Enrique, Jorge Carrillo De Albornoz, Irina Chugur, Adolfo Corujo, Julio Gonzalo, Tamara Martín, Edgar Meij, Maarten de Rijke, and Damiano Spina.“ replab 2013概述: 评估在线声誉监控系统。”欧洲语言跨语言评价论坛国际会议，第页。333-352.Springer Berlin Heidelberg，2013年。Amigó, Enrique, Jorge Carrillo-de-Albornoz, Irina Chugur, Adolfo Corujo, Julio Gonzalo, Edgar Meij, Maarten de Rijke, and Damiano Spina.“ replab 2014概述: 在线声誉管理的作者特征和声誉维度。”欧洲语言跨语言评价论坛国际会议，第页。307-322.斯普林格国际出版社，2014年。

−

~~</ref> can benefit from the introduction~~ of ~~a neutral class~~ and ~~improve the overall accuracy of the classification. There are in principle two ways for operating with a neutral class. Either, the algorithm proceeds by first identifying the neutral language~~, ~~filtering it out and then assessing the rest in terms of positive and negative sentiments, or it builds~~ a three-way classification in one step.<ref>{{Cite journal|last1=Ribeiro|first1=Filipe Nunes|last2=Araujo|first2=Matheus|date=2010|title=A Benchmark Comparison of State-of-the-Practice Sentiment Analysis Methods|url=https://www.researchgate.net/publication/286302059|journal=Transactions on Embedded Computing Systems |volume=9 |issue=4}}</ref> This second approach often involves estimating a probability distribution over all categories (e.g. [[Naive Bayes classifier|naive Bayes]] classifiers as implemented by the [[Nltk|NLTK]]). Whether and how to ~~use~~ a ~~neutral class depends on the nature~~ of ~~the data: if the~~ data is clearly clustered into neutral, negative and positive language, it makes sense to filter the neutral language out and focus on the polarity between positive and negative sentiments. If, in contrast, the data are mostly neutral with small deviations towards positive and negative affect, this strategy would make it harder to clearly distinguish between the two poles.

+

Because evaluation of sentiment analysis is becoming more and more task based, each implementation needs a separate training model to get a more accurate representation of sentiment for a given data set.

+

Because evaluation of sentiment analysis is becoming more and more task based, each implementation needs a separate training model to get a more accurate representation of sentiment for a given data set.

+

由于情感分析的评价越来越多地基于任务，每个实现都需要一个单独的训练模型来更准确地表达给定数据集的情感。

−

~~A different method for determining~~ sentiment is the ~~use~~ of ~~a scaling system whereby words commonly associated with having a negative~~, ~~neutral~~, ~~or positive sentiment with them are given an associated number on~~ a ~~−10~~ to ~~+10 scale (most negative up to most positive) or simply from 0 to a positive upper limit such as +4~~. ~~This makes it possible~~ to ~~adjust~~ the ~~sentiment~~ of ~~a given term relative to its environment (usually on~~ the ~~level of~~ the ~~sentence). When a piece of unstructured text is analyzed using [[natural language processing]]~~, ~~each concept in~~ the ~~specified environment is given a score based on~~ the ~~way~~ sentiment ~~words relate to the concept and its associated score~~.<ref~~>{{Cite journal|last1~~=~~Taboada|first1=Maite|last2=Brooke|first2=Julian|date=2011|title=Lexicon-based methods~~ for ~~sentiment analysis|url=http~~://dl.~~acm~~.~~org~~/~~citation~~.~~cfm~~?~~id=2000518|journal=Computational Linguistics |volume=37 |issue=2 |pages=272–274|doi=10.1162/coli_a_00049|citeseerx~~=~~10.~~1~~.1.188.5517|s2cid=3181362}}</ref><ref>{{Cite journal|last1=Augustyniak|first1=Łukasz|last2=Szymański|first2=Piotr|last3=Kajdanowicz|first3=Tomasz|last4=Tuligłowicz|first4=Włodzimierz|date=2015~~-12-~~25|title=Comprehensive Study~~ on ~~Lexicon~~-~~based Ensemble Classification Sentiment Analysis|journal=Entropy|language=en|volume=18|issue=1|pages=4|doi=~~10.~~3390/e18010004|bibcode=2015Entrp..18....4A|doi-access=free}}~~</ref><ref>{{~~Cite journal|last1=Mehmood|first1=Yasir|last2=Balakrishnan|first2=Vimala|date=2020-01-01~~|title=~~An enhanced lexicon-based approach for sentiment analysis: a case study~~ on ~~illegal immigration~~|url=~~https~~://~~doi~~.~~org/10~~.~~1108~~/~~OIR~~-10-~~2018~~-~~0295|journal~~=~~Online Information Review~~|~~volume=44|issue=5|pages=1097–1117|doi~~=~~10.1108/OIR~~-10-~~2018-0295|issn=1468-4527~~}}</ref> ~~This allows movement to a more sophisticated understanding of sentiment~~, ~~because it is now possible to adjust~~ the ~~sentiment value~~ of ~~a concept relative to modifications that~~ may ~~surround it. Words, for example, that intensify, relax or negate the sentiment expressed by the concept can affect its score. Alternatively, texts can~~ be ~~given a positive and negative sentiment strength score if~~ the ~~goal~~ is ~~to determine the sentiment in a text rather than the overall polarity and strength of the text~~.<ref name ="~~SentiStrength2010~~">

+

== Web 2.0 ==

+

The rise of [[social media]] such as [[blogs]] and [[social network]]s has fueled interest in sentiment analysis. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations. As businesses look to automate the process of filtering out the noise, understanding the conversations, identifying the relevant content and actioning it appropriately, many are now looking to the field of sentiment analysis.<ref name="Mining the Web for Feelings, Not Facts">Wright, Alex. [https://www.nytimes.com/2009/08/24/technology/internet/24emotion.html?_r=1 "Mining the Web for Feelings, Not Facts"], ''[[New York Times]]'', 2009-08-23. Retrieved on 2009-10-01.</ref> Further complicating the matter, is the rise of anonymous social media platforms such as [[4chan]] and [[Reddit]].<ref>{{cite web|title=Sentiment Analysis on Reddit|url=http://news.humanele.com/sentiment-analysis-reddit/|access-date=10 October 2014|date=2014-09-30}}</ref> If [[web 2.0]] was all about democratizing publishing, then the next stage of the web may well be based on democratizing [[data mining]] of all the content that is getting published.<ref name="The Future of Social Media Monitoring">Kirkpatrick, Marshall. [https://readwrite.com/2009/04/15/whats_next_in_social_media_monitoring/ "], ''[[ReadWriteWeb]]'', 2009-04-15. Retrieved on 2009-10-01.</ref>

−

The rise of social media such as blogs and social networks has fueled interest in sentiment analysis. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations. As businesses look to automate the process of filtering out the noise, understanding the conversations, identifying the relevant content and actioning it appropriately, many are now looking to the field of sentiment analysis. Further complicating the matter, is the rise of anonymous social media platforms such as 4chan and Reddit. If web 2.0 was all about democratizing publishing, then the next stage of the web may well be based on democratizing data mining of all the content that is getting published.

−

博客和社交网络等社交媒体的兴起激发了人们对情绪分析的兴趣。随着评论、评级、推荐和其他形式的网络表达的激增，网络舆论已经变成了一种虚拟货币，企业可以通过这种货币来推销自己的产品、寻找新的机会和管理自己的声誉。随着企业寻求自动化过滤噪音的过程，理解对话，识别相关内容并适当活动，许多企业现在正在寻找情绪分析领域。更复杂的是，匿名社交媒体平台的兴起，如4chan 和 Reddit。如果说 web 2.~~0完全是关于民主化发布，那么~~ web ~~的下一个阶段很可能是基于对所有正在发布的内容的民主化数据挖掘。~~

+

The rise of social media such as blogs and social networks has fueled interest in sentiment analysis. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations. As businesses look to automate the process of filtering out the noise, understanding the conversations, identifying the relevant content and actioning it appropriately, many are now looking to the field of sentiment analysis.Wright, Alex. "Mining the Web for Feelings, Not Facts", New York Times, 2009-08-23. Retrieved on 2009-10-01. Further complicating the matter, is the rise of anonymous social media platforms such as 4chan and Reddit. If web 2.0 was all about democratizing publishing, then the next stage of the web may well be based on democratizing data mining of all the content that is getting published.Kirkpatrick, Marshall. ", ReadWriteWeb, 2009-04-15. Retrieved on 2009-10-01.

−

~~{{cite journal~~

+

= = Web 2.0 = = 博客和社交网络等社交媒体的兴起，激发了人们对情绪分析的兴趣。随着评论、评级、推荐和其他形式的网络表达的激增，网络舆论已经变成了一种虚拟货币，企业可以通过这种货币来推销自己的产品、寻找新的机会和管理自己的声誉。随着企业寻求自动化过滤噪音的过程，理解对话，识别相关内容并适当活动，许多企业现在正在寻找情绪分析领域。莱特，亚历克斯。“从网上挖掘情感，而不是事实”，纽约时报，2009-08-23。2009-10-01.更复杂的是，匿名社交媒体平台的兴起，如4chan 和 Reddit。如果说 web 2.0完全是关于民主化发布，那么 web 的下一个阶段很可能是基于对所有正在发布的内容的民主化数据挖掘。马歇尔 · 柯克帕特里克。”，ReadWriteWeb，2009-04-15。2009-10-01.

−

| ~~first1~~ = ~~Mike~~

+

One step towards this aim is accomplished in research. Several research teams in universities around the world currently focus on understanding the dynamics of sentiment in [[Virtual community|e-communities]] through sentiment analysis.<ref name="Collective emotions in cyberspace">CORDIS. [http://cordis.europa.eu/fetch?CALLER=FP7_PROJ_EN&ACTION=D&DOC=1&CAT=PROJ&QUERY=011e4ea33ef2:358b:41dc0328&RCN=89032 "Collective emotions in cyberspace (CYBEREMOTIONS)"], ''[[European Commission]]'', 2009-02-03. Retrieved on 2010-12-13.</ref> The [[CyberEmotions|CyberEmotions project]], for instance, recently identified the role of negative [[emotion]]s in driving social networks discussions.<ref name="NewSci_flaming">Condliffe, Jamie. [https://www.newscientist.com/article/dn19821-flaming-drives-online-social-networks.html "Flaming drives online social networks "], ''[[New Scientist]]'', 2010-12-07. Retrieved on 2010-12-13.</ref>

−

One step towards this aim is accomplished in research. Several research teams in universities around the world currently focus on understanding the dynamics of sentiment in e-communities through sentiment analysis. The CyberEmotions project, for instance, recently identified the role of negative emotions in driving social networks discussions.

+

One step towards this aim is accomplished in research. Several research teams in universities around the world currently focus on understanding the dynamics of sentiment in e-communities through sentiment analysis.CORDIS. "Collective emotions in cyberspace (CYBEREMOTIONS)", European Commission, 2009-02-03. Retrieved on 2010-12-13. The CyberEmotions project, for instance, recently identified the role of negative emotions in driving social networks discussions.Condliffe, Jamie. "Flaming drives online social networks ", New Scientist, 2010-12-07. Retrieved on 2010-12-13.

−

实现这一目标的一个步骤就是研究。目前，世界各地的一些大学的研究团队通过情绪分析来了解电子社区中情绪的动态。例如，CyberEmotions 项目最近发现了负面情绪在推动社交网络讨论中的作用。

+

实现这一目标的一个步骤就是研究。目前，世界各地的一些大学的研究团队通过情绪分析专注于了解电子社区中情绪的动态。「网络空间的集体情绪」，欧洲委员会，2009-02-03。2010-12-13.例如，CyberEmotions 项目最近发现了负面情绪在推动社交网络讨论中的作用。“火焰驱动在线社交网络”，《新科学家》，2010-12-07。2010-12-13.

−

~~| last1~~ = ~~Thelwall~~

+

The problem is that most sentiment analysis algorithms use simple terms to express sentiment about a product or service. However, cultural factors, linguistic nuances, and differing contexts make it extremely difficult to turn a string of written text into a simple pro or con sentiment.<ref name="Mining the Web for Feelings, Not Facts"/> The fact that humans often disagree on the sentiment of text illustrates how big a task it is for computers to get this right. The shorter the string of text, the harder it becomes.

−

~~| first2 = Kevan~~

+

The problem is that most sentiment analysis algorithms use simple terms to express sentiment about a product or service. However, cultural factors, linguistic nuances, and differing contexts make it extremely difficult to turn a string of written text into a simple pro or con sentiment. The fact that humans often disagree on the sentiment of text illustrates how big a task it is for computers to get this right. The shorter the string of text, the harder it becomes.

−

The problem is that most sentiment analysis algorithms use simple terms to express sentiment about a product or service. However, cultural factors, linguistic nuances, and differing contexts make it extremely difficult to turn a string of written text into a simple pro or con sentiment. Furthermore, sentiment analysis on Twitter has also been shown to capture the public mood behind human reproduction cycles on a planetary scale, as well as other problems of public-health relevance such as adverse drug reactions.

+

问题是，大多数情绪分析算法使用简单的术语来表达对产品或服务的情绪。然而，文化因素、语言上的细微差别以及不同的语境使得将一串文字转换成简单的赞成或反对的情绪变得极其困难。事实上，人们经常不同意文本的情绪，这说明了计算机要做好这件事是多么艰巨的任务。字符串越短，就越难。

−

问题是，大多数情绪分析算法使用简单的术语来表达对产品或服务的情绪。然而，文化因素、语言上的细微差别以及不同的语境使得将一串文字转换成简单的赞成或反对的情绪变得极其困难。此外，推特上的情绪分析也表明，在全球范围内，人类生殖周期背后的公众情绪，以及其他与公共健康相关的问题，如药物不良反应。

+

Even though short text strings might be a problem, sentiment analysis within [[microblogging]] has shown that [[Twitter]] can be seen as a valid online indicator of political sentiment. Tweets' political sentiment demonstrates close correspondence to parties' and politicians' political positions, indicating that the content of Twitter messages plausibly reflects the offline political landscape.<ref>Tumasjan, Andranik; O.Sprenger, Timm; G.Sandner, Philipp; M.Welpe, Isabell (2010). [http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/viewFile/1441/1852 "Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment"]. "Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media"</ref> Furthermore, sentiment analysis on [[Twitter]] has also been shown to capture the public mood behind human reproduction cycles on a planetary scale{{peacock term|date=June 2018}},<ref name="r25">{{cite journal|doi=10.1038/s41598-017-18262-5|pmid=29269945|pmc=5740080|title=Human Sexual Cycles are Driven by Culture and Match Collective Moods|journal=Scientific Reports|volume=7|issue=1|pages=17973|year=2017|last1=Wood|first1=Ian B.|last2=Varela|first2=Pedro L.|last3=Bollen|first3=Johan|last4=Rocha|first4=Luis M.|last5=Gonçalves-Sá|first5=Joana|bibcode=2017NatSR...717973W|arxiv=1707.03959}}</ref> as well as other problems of public-health relevance such as adverse drug reactions.<ref name="r27">{{cite journal|doi=10.1016/j.jbi.2016.06.007|pmid=27363901|pmc=4981644|title=Analysis of the effect of sentiment analysis on extracting adverse drug reactions from tweets and forum posts|journal=Journal of Biomedical Informatics|volume=62|pages=148–158|year=2016|last1=Korkontzelos|first1=Ioannis|last2=Nikfarjam|first2=Azadeh|last3=Shardlow|first3=Matthew|last4=Sarker|first4=Abeed|last5=Ananiadou|first5=Sophia|last6=Gonzalez|first6=Graciela H.}}</ref>

−

~~| last2 = Buckley~~

+

Even though short text strings might be a problem, sentiment analysis within microblogging has shown that Twitter can be seen as a valid online indicator of political sentiment. Tweets' political sentiment demonstrates close correspondence to parties' and politicians' political positions, indicating that the content of Twitter messages plausibly reflects the offline political landscape.Tumasjan, Andranik; O.Sprenger, Timm; G.Sandner, Philipp; M.Welpe, Isabell (2010). "Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment". "Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media" Furthermore, sentiment analysis on Twitter has also been shown to capture the public mood behind human reproduction cycles on a planetary scale, as well as other problems of public-health relevance such as adverse drug reactions.

−

~~| first3 = Georgios~~

+

尽管短文字符串可能是个问题，微博内的情绪分析已经表明，Twitter 可以被视为一个有效的政治情绪在线指标。推特的政治情绪表明，它与政党和政客的政治立场非常吻合，这表明推特信息的内容合理地反映了线下的政治格局。安德拉尼克; O.Sprenger，Timm; G.Sandner，Philipp; M.Welpe，Isabell (2010)。“用 Twitter 预测选举: 140个人物揭示的政治情绪”。此外，推特上的情绪分析还显示，在全球范围内，人类生殖周期背后的公众情绪，以及其他与公共健康相关的问题，如药物不良反应。

−

| ~~last3 = Paltoglou~~

+

== Application in recommender systems ==

+

For a [[recommender system]], sentiment analysis has been proven to be a valuable technique. A [[recommender system]] aims to predict the preference for an item of a target user. Mainstream recommender systems work on explicit data set. For example, [[collaborative filtering]] works on the rating matrix, and [[content-based filtering]] works on the [[Metadata|meta-data]] of the items.

−

~~| first4 = Di~~

For a recommender system, sentiment analysis has been proven to be a valuable technique. A recommender system aims to predict the preference for an item of a target user. Mainstream recommender systems work on explicit data set. For example, collaborative filtering works on the rating matrix, and content-based filtering works on the meta-data of the items.

−

一个推荐系统以来，情绪分析已经被证明是一种有价值的技术。推荐系统的目的是预测目标用户对某个商品的偏好。主流推荐系统工作在显式数据集上。例如，协同过滤工作在评级矩阵上，基于内容的过滤工作在项目的元数据上。

+

= = = 在推荐系统中的应用 = = = 一个推荐系统以来，情绪分析已被证明是一种有价值的技术。推荐系统的目的是预测目标用户对某个商品的偏好。主流推荐系统工作在显式数据集上。例如，协同过滤工作在评级矩阵上，基于内容的过滤工作在项目的元数据上。

−

| ~~last4~~ = ~~Cai~~

+

In many [[social networking service]]s or [[e-commerce]] websites, users can provide text review, comment or feedback to the items. These user-generated text provide a rich source of user's sentiment opinions about numerous products and items. Potentially, for an item, such text can reveal both the related feature/aspects of the item and the users' sentiments on each feature.<ref>{{cite journal|url=https://pdfs.semanticscholar.org/8f1b/9b97183b8aa2caa0fb6c9563b14daabe8316.pdf|archive-url=https://web.archive.org/web/20180524004208/https://pdfs.semanticscholar.org/8f1b/9b97183b8aa2caa0fb6c9563b14daabe8316.pdf|url-status=dead|archive-date=2018-05-24|first1=Huifeng|last1=Tang|first2=Songbo|last2=Tan|first3=Xueqi|last3=Cheng|title=A survey on sentiment detection of reviews|journal=Expert Systems with Applications|volume=36|issue=7|year=2009|pages=10760–10773|doi=10.1016/j.eswa.2009.02.063|s2cid=2178380}}</ref> The item's feature/aspects described in the text play the same role with the meta-data in [[content-based filtering]], but the former are more valuable for the recommender system. Since these features are broadly mentioned by users in their reviews, they can be seen as the most crucial features that can significantly influence the user's experience on the item, while the meta-data of the item (usually provided by the producers instead of consumers) may ignore features that are concerned by the users. For different items with common features, a user may give different sentiments. Also, a feature of the same item may receive different sentiments from different users. Users' sentiments on the features can be regarded as a multi-dimensional rating score, reflecting their preference on the items.

−

| ~~first5~~ = ~~Arvid~~

In many social networking services or e-commerce websites, users can provide text review, comment or feedback to the items. These user-generated text provide a rich source of user's sentiment opinions about numerous products and items. Potentially, for an item, such text can reveal both the related feature/aspects of the item and the users' sentiments on each feature. The item's feature/aspects described in the text play the same role with the meta-data in content-based filtering, but the former are more valuable for the recommender system. Since these features are broadly mentioned by users in their reviews, they can be seen as the most crucial features that can significantly influence the user's experience on the item, while the meta-data of the item (usually provided by the producers instead of consumers) may ignore features that are concerned by the users. For different items with common features, a user may give different sentiments. Also, a feature of the same item may receive different sentiments from different users. Users' sentiments on the features can be regarded as a multi-dimensional rating score, reflecting their preference on the items.

第387行：第743行：

在许多社交网络服务或电子商务网站，用户可以提供文本审查，评论或反馈的项目。这些用户生成的文本提供了丰富的来源，用户对许多产品和项目的情感意见。对于一个项目，这样的文本可以显示项目的相关特性/方面以及用户对每个特性的看法。在基于内容的过滤中，文本中描述的条目的特征/方面与元数据起着同样的作用，但前者对推荐系统更有价值。由于用户在评论中广泛提到这些功能，它们可以被视为能够显著影响用户对产品的体验的最关键的功能，而产品的元数据(通常由生产者而不是消费者提供)可能忽略用户关心的功能。对于具有共同特征的不同项目，用户可能会给出不同的感受。而且，同一个项目的某个特性可能会收到不同用户的不同意见。用户对特征的感受可以看作是一个多维度的评分分值，反映了用户对特征的偏好。

−

| ~~last5~~ = ~~Kappas~~

+

Based on the feature/aspects and the sentiments extracted from the user-generated text, a hybrid recommender system can be constructed.<ref name=":0">Jakob, Niklas, et al. "Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations." ''Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion''. ACM, 2009.</ref> There are two types of motivation to recommend a candidate item to a user. The first motivation is the candidate item have numerous common features with the user's preferred items,<ref>{{cite journal|first1=Hu|last1=Minqing|first2=Bing|last2=Liu|title=Mining opinion features in customer reviews|journal=AAAI|volume=4|issue=4|year=2004|s2cid=5724860|url=https://pdfs.semanticscholar.org/ee6c/726b55c66d4c222556cfae62a4eb69aa86b7.pdf|archive-url=https://web.archive.org/web/20180524004041/https://pdfs.semanticscholar.org/ee6c/726b55c66d4c222556cfae62a4eb69aa86b7.pdf|url-status=dead|archive-date=2018-05-24}}</ref> while the second motivation is that the candidate item receives a high sentiment on its features. For a preferred item, it is reasonable to believe that items with the same features will have a similar function or utility. So, these items will also likely to be preferred by the user. On the other hand, for a shared feature of two candidate items, other users may give positive sentiment to one of them while giving negative sentiment to another. Clearly, the high evaluated item should be recommended to the user. Based on these two motivations, a combination ranking score of similarity and sentiment rating can be constructed for each candidate item.<ref name=":0" />

+

Based on the feature/aspects and the sentiments extracted from the user-generated text, a hybrid recommender system can be constructed.Jakob, Niklas, et al. "Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations." Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion. ACM, 2009. There are two types of motivation to recommend a candidate item to a user. The first motivation is the candidate item have numerous common features with the user's preferred items, while the second motivation is that the candidate item receives a high sentiment on its features. For a preferred item, it is reasonable to believe that items with the same features will have a similar function or utility. So, these items will also likely to be preferred by the user. On the other hand, for a shared feature of two candidate items, other users may give positive sentiment to one of them while giving negative sentiment to another. Clearly, the high evaluated item should be recommended to the user. Based on these two motivations, a combination ranking score of similarity and sentiment rating can be constructed for each candidate item.

−

~~| title = Sentiment strength detection in short informal text~~

+

基于特征/方面和从用户生成的文本中提取的情感，可以构造一个混合推荐系统。雅各布，尼克拉斯，等等。“超越明星: 利用免费文本用户评论来提高电影推荐的准确性。”第一届国际信息和通信技术会议论文集——民意情绪分析。美国计算机协会，2009。向用户推荐候选商品有两种动机。第一个动机是候选项目与用户偏好项目具有许多共同特征，第二个动机是候选项目对其特征的高度评价。对于一个首选项目，有理由相信具有相同特性的项目将具有类似的功能或实用性。因此，这些项目也可能是首选的用户。另一方面，对于两个候选项目的共同特征，其他用户可能给予其中一个正面的情绪，而给予另一个负面的情绪。显然，应该向用户推荐评价较高的项目。基于这两个动机，可以为每个候选项目建立相似度和情感评分的组合排序评分。

−

~~Based on the feature/aspects and the sentiments extracted from~~ the ~~user-generated text, a hybrid recommender system can be constructed. There are two types~~ of ~~motivation to recommend a candidate item to a user. The first motivation is~~ the ~~candidate item have numerous common features with the user's preferred items~~, ~~while the second motivation is that the candidate item receives a high~~ sentiment on ~~its features. For a preferred item, it is reasonable to believe that items with the same features will have a similar function~~ or ~~utility. So, these items will~~ also ~~likely to be preferred by~~ the ~~user~~. On the ~~other hand, for a shared feature~~ of ~~two candidate items, other users may give positive sentiment to one of them while giving negative sentiment to another~~. ~~Clearly,~~ the ~~high evaluated item should be recommended to the user~~. ~~Based~~ on ~~these two motivations, a combination ranking score of similarity and sentiment rating can be constructed for each candidate item~~. Review or feedback poorly written is hardly helpful for recommender system. Besides, a review can be designed to hinder sales of a target product, thus be harmful to the recommender system even it is well written.

+

Except for the difficulty of the sentiment analysis itself, applying sentiment analysis on reviews or feedback also faces the challenge of spam and biased reviews. One direction of work is focused on evaluating the helpfulness of each review.<ref>{{cite book|first1=Yang|last1=Liu|first2=Xiangji|last2=Huang|first3=Aijun|last3=An|first4=Xiaohui|last4=Yu|chapter-url=http://www.yorku.ca/xhyu/papers/ICDM2008.pdf|chapter=Modeling and predicting the helpfulness of online reviews|year=2008|title=ICDM'08. Eighth IEEE international conference on Data mining|pages=443–452|publisher= IEEE|doi=10.1109/ICDM.2008.94|isbn=978-0-7695-3502-9|s2cid=18235238}}</ref> Review or feedback poorly written is hardly helpful for recommender system. Besides, a review can be designed to hinder sales of a target product, thus be harmful to the recommender system even it is well written.

−

基于特征/方面和从用户生成的文本中提取的情感，可以构造一个混合推荐系统。向用户推荐候选商品有两种动机。第一个动机是候选项目与用户偏好项目具有许多共同特征，第二个动机是候选项目对其特征的高度评价。对于一个首选项目，有理由相信具有相同特性的项目将具有类似的功能或实用性。因此，这些项目也可能是首选的用户。另一方面，对于两个候选项目的共同特征，其他用户可能给予其中一个正面的情绪，而给予另一个负面的情绪。显然，应该向用户推荐评价较高的项目。基于这两个动机，可以为每个候选项目建立相似度和情感评分的组合排序评分。写得不好的评论或反馈对推荐系统几乎没有任何帮助。此外，审查可能被设计成阻碍目标产品的销售，因此即使它写得很好也会对推荐系统产品造成伤害。

+

Except for the difficulty of the sentiment analysis itself, applying sentiment analysis on reviews or feedback also faces the challenge of spam and biased reviews. One direction of work is focused on evaluating the helpfulness of each review. Review or feedback poorly written is hardly helpful for recommender system. Besides, a review can be designed to hinder sales of a target product, thus be harmful to the recommender system even it is well written.

−

~~| year = 2010~~

+

除了情感分析本身的困难之外，对评论或反馈进行情感分析也面临着垃圾评论和有偏见的评论的挑战。其中一个工作方向是评估每个审查的有用性。写得不好的评论或反馈对推荐系统几乎没有任何帮助。此外，审查可能被设计成阻碍目标产品的销售，因此即使它写得很好也会对推荐系统产品造成伤害。

−

| journal = ~~Journal~~ of the ~~American Society for~~ Information ~~Science~~ and ~~Technology~~

+

Researchers also found that long and short forms of user-generated text should be treated differently. An interesting result shows that short-form reviews are sometimes more helpful than long-form,<ref>{{cite book|doi=10.1145/1871437.1871741|last1=Bermingham|first1=Adam|last2=Smeaton|first2=Alan F.|title=Classifying sentiment in microblogs: is brevity an advantage?|journal=Proceedings of the 19th ACM International Conference on Information and Knowledge Management|pages=1833|year=2010|isbn=9781450300995|s2cid=2084603|url=http://doras.dcu.ie/15663/1/cikm1079-bermingham.pdf}}</ref> because it is easier to filter out the noise in a short-form text. For the long-form text, the growing length of the text does not always bring a proportionate increase in the number of features or sentiments in the text.

Researchers also found that long and short forms of user-generated text should be treated differently. An interesting result shows that short-form reviews are sometimes more helpful than long-form, because it is easier to filter out the noise in a short-form text. For the long-form text, the growing length of the text does not always bring a proportionate increase in the number of features or sentiments in the text.

第403行：第761行：

研究人员还发现，用户生成的长文本和短文本应该区别对待。一个有趣的结果表明，短形式的评论有时比长形式的评论更有帮助，因为它更容易过滤掉短形式文本中的干扰。对于长篇文本，文本长度的增长并不总是带来文本中特征或情感数量的相应增加。

−

| volume= 61

+

Lamba & Madhusudhan<ref>{{cite journal |last1=Lamba |first1=Manika |last2=Madhusudhan |first2=Margam |title=Application of sentiment analysis in libraries to provide temporal information service: a case study on various facets of productivity |journal=Social Network Analysis and Mining |year=2018 |volume=8 |issue=1|pages=1–12|doi=10.1007/s13278-018-0541-y |s2cid=53047128 }}</ref> introduce a nascent way to cater the information needs of today’s library users by repackaging the results from sentiment analysis of social media platforms like Twitter and provide it as a consolidated time-based service in different formats. Further, they propose a new way of conducting marketing in libraries using social media mining and sentiment analysis.

−

| issue= 12

Lamba & Madhusudhan introduce a nascent way to cater the information needs of today’s library users by repackaging the results from sentiment analysis of social media platforms like Twitter and provide it as a consolidated time-based service in different formats. Further, they propose a new way of conducting marketing in libraries using social media mining and sentiment analysis.

第411行：第767行：

Lamba & Madhusudhan 介绍了一种新的方法来满足当今图书馆用户的信息需求，方法是将 Twitter 等社交媒体平台的情绪分析结果重新打包，以不同的格式提供综合的基于时间的服务。此外，他们还提出了一种利用社会媒体挖掘和情感分析在图书馆进行营销的新方法。

−

~~| pages~~= ~~2544–2558~~

+

==See also==

+

* [[Emotion recognition]]

+

* [[Market sentiment]]

+

* [[Stylometry]]

−

~~| url = http://www.scit.wlv.ac.uk/~cm1993/papers/SentiStrengthPreprint.doc~~

+

* Emotion recognition

−

+

* Market sentiment

−

~~| doi=10.1002/asi.21416~~

+

* Stylometry

−

~~| citeseerx = 10.1.1.278.3863~~

−

}}

+

= = = = =

+

* 情感识别

+

* 市场情绪

+

* 文体测量学

−

~~</ref>~~

+

== References ==

+

[[Category:Natural language processing]]

+

[[Category:Affective computing]]

+

[[Category:Social media]]

+

[[Category:Polling]]

−

There are various other types of sentiment analysis like- Aspect Based sentiment analysis, Grading sentiment analysis (positive,negative,neutral), Multilingual sentiment analysis and detection of emotions.

−

~~=== Subjectivity/objectivity identification ===~~

−

~~This task is commonly defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective.<ref name="PangLee08Subjectivity">{{cite book~~

−

~~| first1 = Bo~~

−

~~| last1 = Pang~~

Category:Natural language processing

−

~~类别: 自然语言处理~~

−

~~| first2 = Lillian~~

−

Category:Affective computing

−

~~分类: 情感计算~~

−

~~| last2 = Lee~~

−

Category:Social media

−

~~分类: 社交媒体~~

−

~~| title = Opinion Mining and Sentiment Analysis~~

−

Category:Polling

−

类别: 投票

+

类别: 自然语言处理类别: 情感计算类别: 社会媒体类别: 轮询

Moonscar

管理员

1,592

个编辑

更改

情感分析 (查看源代码)

2021年7月20日 (二) 17:32的版本

导航菜单

搜索