更改

添加51,832字节、 2021年7月19日 (一) 15:19

Moved page from wikipedia:en:Sentiment analysis (history)

此词条暂由彩云小译翻译，翻译字数共3167，未经人工整理和审校，带来阅读不便，请见谅。

{{short description|marketing and customer service tool}}

'''Sentiment analysis''' (also known as '''opinion mining''' or '''emotion AI''') is the use of [[natural language processing]], [[Text analytics|text analysis]], [[computational linguistics]], and [[biometrics]] to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to [[voice of the customer]] materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from [[marketing]] to [[Customer relationship management|customer service]] to clinical medicine.

Sentiment analysis (also known as opinion mining or emotion AI) is the use of natural language processing, text analysis, computational linguistics, and biometrics to systematically identify, extract, quantify, and study affective states and subjective information. Sentiment analysis is widely applied to voice of the customer materials such as reviews and survey responses, online and social media, and healthcare materials for applications that range from marketing to customer service to clinical medicine.

情感分析(又称意见挖掘或情感人工智能)是利用自然语言处理、文本分析、计算语言学分析和生物特征识别技术系统地识别、提取、量化和研究情感状态和主观信息。情感分析被广泛应用于客户材料的声音，如评论和调查回应，在线和社交媒体，以及从市场营销到客户服务到临床医学的各种应用的医疗材料。

== Examples ==

The objective and challenges of sentiment analysis can be shown through some simple examples.

The objective and challenges of sentiment analysis can be shown through some simple examples.

情感分析的目的和挑战可以通过一些简单的例子来说明。

=== Simple cases ===

* Coronet has the best lines of all day cruisers.

* Bertram has a deep V hull and runs easily through seas.

* Pastel-colored 1980s day cruisers from Florida are ugly.

* I dislike old [[cabin cruiser]]s.

=== More challenging examples ===

* I do not dislike cabin cruisers. ([[Negation]] handling)

* Disliking watercraft is not really my thing. (Negation, inverted [[word order]])

* Sometimes I really hate [[Rigid-hulled inflatable boat|RIBs]]. ([[Adverbial]] modifies the sentiment)

* I'd really truly love going out in this weather! (Possibly [[sarcastic]])

* Chris Craft is better looking than Limestone. (Two [[brand name]]s, identifying the target of attitude is difficult).

* Chris Craft is better looking than Limestone, but Limestone projects seaworthiness and reliability. (Two attitudes, two brand names).

* The movie is surprising with plenty of unsettling plot twists. (Negative term used in a positive sense in certain domains).

* You should see their decadent dessert menu. (Attitudinal term has shifted polarity recently in certain domains)

* I love my mobile but would not recommend it to any of my colleagues. (Qualified positive sentiment, difficult to categorise)

* Next week's gig will be right koide9! ("Quoi de neuf?", French for "what's new?". Newly minted terms can be highly attitudinal but volatile in polarity and often out of known vocabulary.)

== Types ==

A basic task in sentiment analysis is classifying the ''polarity'' of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as enjoyment, anger, disgust, sadness, fear, and surprise. <ref> Vong Anh Ho, Duong Huynh-Cong Nguyen, Danh Hoang Nguyen, Linh Thi-Van Pham, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen. "Emotion Recognition

A basic task in sentiment analysis is classifying the polarity of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as enjoyment, anger, disgust, sadness, fear, and surprise.

情感分析的一个基本任务就是在文档、句子或者特征/方面层面上对给定文本的极性进行分类ーー文档、句子或者实体特征/方面表达的意见是正面的、负面的还是中性的。先进的“超极性”情绪分类研究，例如，在情绪状态，如享受，愤怒，厌恶，悲伤，恐惧，和惊讶。

for Vietnamese Social Media Text". In Proceedings of the 2019 International Conference of the Pacific Association for Computational Linguistics (PACLING 2019), Hanoi, Vietnam (2019).</ref>

Precursors to sentimental analysis include the General Inquirer, which provided hints toward quantifying patterns in text and, separately, psychological research that examined a person's psychological state based on analysis of their verbal behavior.

情感分析的先驱包括《一般询问者》，它提供了对文本中的量化模式的线索，以及单独的心理学研究，它基于对人们言语行为的分析来检查人们的心理状态。

Precursors to sentimental analysis include the General Inquirer,<ref>Stone, Philip J., Dexter C. Dunphy, and Marshall S. Smith. "The general inquirer: A computer approach to content analysis." MIT Press, Cambridge, MA (1966).</ref> which provided hints toward quantifying patterns in text and, separately, psychological research that examined a person's [[psychological state]] based on analysis of their verbal behavior.<ref>Gottschalk, Louis August, and Goldine C. Gleser. The measurement of psychological states through the content analysis of verbal behavior. Univ of California Press, 1969.</ref>

Subsequently, the method described in a patent by Volcani and Fogel, looked specifically at sentiment and identified individual words and phrases in text with respect to different emotional scales. A current system based on their work, called EffectCheck, presents synonyms that can be used to increase or decrease the level of evoked emotion in each scale.

随后，火山和福格尔在一项专利中描述的方法，专门研究了情感，并根据不同的情感尺度识别了文本中的单个单词和短语。一个基于他们的工作的现行系统，称为 EffectCheck，提出了同义词，可以用来增加或减少在每个规模的诱发情绪的水平。

Subsequently, the method described in a patent by Volcani and Fogel,<ref>{{cite patent

| country = USA

Many other subsequent efforts were less sophisticated, using a mere polar view of sentiment, from positive to negative, such as work by Turney, who applied different methods for detecting the polarity of product reviews and movie reviews respectively. This work is at the document level. One can also classify a document's polarity on a multi-way scale, which was attempted by Pang and Snyder among others: Pang and Lee

后来的许多努力都没有那么复杂，仅仅使用了从正面到负面的情绪极性视角，比如特尼的工作，他分别使用了不同的方法来检测产品评论和电影评论的极性。这项工作是在文档级别进行的。人们还可以在多方面的尺度上对文件的极性进行分类，庞和斯奈德等人曾尝试过这种方法

| number = 7,136,877

| status = Issued

Even though in most statistical classification methods, the neutral class is ignored under the assumption that neutral texts lie near the boundary of the binary classifier, several researchers suggest that, as in every polarity problem, three categories must be identified. Moreover, it can be proven that specific classifiers such as the Max Entropy and SVMs can benefit from the introduction of a neutral class and improve the overall accuracy of the classification. There are in principle two ways for operating with a neutral class. Either, the algorithm proceeds by first identifying the neutral language, filtering it out and then assessing the rest in terms of positive and negative sentiments, or it builds a three-way classification in one step. This second approach often involves estimating a probability distribution over all categories (e.g. naive Bayes classifiers as implemented by the NLTK). Whether and how to use a neutral class depends on the nature of the data: if the data is clearly clustered into neutral, negative and positive language, it makes sense to filter the neutral language out and focus on the polarity between positive and negative sentiments. If, in contrast, the data are mostly neutral with small deviations towards positive and negative affect, this strategy would make it harder to clearly distinguish between the two poles.

尽管在大多数分类分类方法中，中性类在假设二进制分类器的边界附近被忽略，一些研究人员建议，在每个极性问题中，必须识别3个类别。此外，还证明了引入中性分类器可以提高分类器的整体准确率，从而使最大熵和支持向量机等特定分类器得到更好的分类效果。原则上有两种操作中立类的方法。要么，算法首先识别中性语言，过滤掉它，然后根据积极和消极情绪评估其余的语言，要么在一个步骤中建立一个三向分类。第二种方法通常涉及到对所有类别的概率分布进行估计(例如:。NLTK 实现的朴素贝叶斯分类器)。是否以及如何使用中性类取决于数据的性质: 如果数据被清晰地分类为中性、消极和积极的语言，那么过滤掉中性语言并关注积极和消极情绪之间的极性是有意义的。相比之下，如果数据大部分是中性的，对积极和消极影响的偏差很小，这种策略将使清楚地区分两极变得更加困难。

| title = System and method for determining and controlling the impact of text

| pubdate = June 28, 2001

A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral, or positive sentiment with them are given an associated number on a −10 to +10 scale (most negative up to most positive) or simply from 0 to a positive upper limit such as +4. This makes it possible to adjust the sentiment of a given term relative to its environment (usually on the level of the sentence). When a piece of unstructured text is analyzed using natural language processing, each concept in the specified environment is given a score based on the way sentiment words relate to the concept and its associated score. This allows movement to a more sophisticated understanding of sentiment, because it is now possible to adjust the sentiment value of a concept relative to modifications that may surround it. Words, for example, that intensify, relax or negate the sentiment expressed by the concept can affect its score. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text.

另一种确定情绪的方法是使用一种比例系统，根据这种比例系统，通常与负面、中性或正面情绪相关的词在 -10到 + 10的范围内(大多数从负面到最正面)或从0到正面上限(如 + 4)被赋予一个相关的数字。这样就可以根据环境调整特定术语的情绪(通常是在句子的层面上)。当使用自然语言处理对一篇非结构化文本进行分析时，基于情感词与概念及其相关得分的关系，给特定环境中的每个概念打分。这使得人们可以对情绪有更深入的理解，因为现在可以调整一个概念的情绪价值，相对于它周围可能发生的变化。例如，强化、放松或否定概念所表达的情感的词语会影响它的得分。或者，如果文本的目标是确定文本中的情绪，而不是文本的总体极性和强度，那么文本可以给出积极和消极的情绪强度评分。

| gdate =

| fdate =

There are various other types of sentiment analysis like- Aspect Based sentiment analysis, Grading sentiment analysis (positive,negative,neutral), Multilingual sentiment analysis and detection of emotions.

还有各种其他类型的情感分析，如基于方面的情感分析，分级情感分析(积极的，消极的，中性的) ，多语言情感分析和情感检测。

| pridate =

| inventor = Volcani, Yanon;

| invent1 =

This task is commonly defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective. This problem can sometimes be more difficult than polarity classification. The subjectivity of words and phrases may depend on their context and an objective document may contain subjective sentences (e.g., a news article quoting people's opinions). Moreover, as mentioned by Su, results are largely dependent on the definition of subjectivity used when annotating texts. However, Pang showed that removing objective sentences from a document before classifying its polarity helped improve performance.

这个任务通常被定义为将一个给定的文本(通常是一个句子)分成两类: 客观的或主观的。这个问题有时比极性分类更难解决。词汇和短语的主观性可能取决于它们的上下文，客观文件可能包含主观句子(例如，一篇引用人们观点的新闻文章)。此外，正如苏所提到的，结果在很大程度上依赖于注释文本时使用的主观性的定义。然而，彭日成表示，在对文件进行分类之前，去掉文件中的客观句子有助于提高表现。

| invent2 = Fogel, David B.

| assign1 =

| assign2 =

| class =

The term objective refers to the incident carry factual information.

客观这个术语指的是携带事实信息的事件。

| url = http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=%2Fnetahtml%2FPTO%2Fsearch-adv.htm&r=2&p=1&f=G&l=50&d=PTXT&S1=(fogel.INNM.+AND+volcani.INNM.)&OS=in/fogel+and+in/volcani&RS=(IN/fogel+AND+IN/volcani)

}}</ref> looked specifically at sentiment and identified individual words and phrases in text with respect to different emotional scales. A current system based on their work, called EffectCheck, presents synonyms that can be used to increase or decrease the level of evoked emotion in each scale.

Many other subsequent efforts were less sophisticated, using a mere polar view of sentiment, from positive to negative, such as work by Turney,<ref name = "Turney02" /> and Pang<ref name = "PangAl02">

The term subjective describes the incident contains non-factual information in various forms, such as personal opinions, judgment, and predictions.Also known as 'private states' mentioned by Quirk et al. In the example down below, it reflects a private states 'We Americans'.  Moreover, the target entity commented by the opinions can take several forms from tangible product to intangible topic matters stated in Liu(2010). A dictionary of extraction rules has to be created for measuring given expressions. Over the years, in subjective detection, the features extraction progression from curating features by hands in 1999 to automated features learning in 2005. At the moment, automated learning methods can further separate into supervised and unsupervised machine learning. Patterns extraction with machine learning process annotated and unannotated text have been explored extensively by academic researchers.

主观这个术语描述的事件包含各种形式的非事实信息，如个人意见、判断和预测。也被称为私有状态。在下面的例子中，它反映了一个私人国家“我们美国人”。此外，被评论的目标实体可以采取从有形产品到刘(2010)所述无形话题事项的多种形式。必须创建一个抽取规则字典来度量给定的表达式。多年来，在主观检测方面，从1999年的手工特征提取到2005年的自动特征学习。目前，自动化学习方法可以进一步分为监督学习和非监督式学习学习。利用机器学习过程对文本进行注释和去注释的模式提取方法已经成为学术界研究的热点。

{{cite conference

| first1 = Bo | last1 = Pang

However, researchers recognized several challenges in developing fixed sets of rules for expressions respectably. Much of the challenges in rule development stems from the nature of textual information. Six challenges have been recognized by several researchers: 1) metaphorical expressions, 2) discrepancies in writings, 3) context-sensitive, 4) represented words with fewer usages, 5) time-sensitive, and 6) ever-growing volume.

然而，研究人员认识到，在为表达式制定一套固定的规则方面存在一些挑战。规则开发中的许多挑战源于文本信息的性质。一些研究人员已经认识到了六个挑战: 1)隐喻性的表达方式，2)写作中的差异，3)上下文敏感性，4)代表用法较少的单词，5)时间敏感性，以及6)不断增长的数量。

| first2 = Lillian | last2 = Lee | author2-link = Lillian Lee (computer scientist)

| first3 = Shivakumar | last3 = Vaithyanathan

Metaphorical expressions. The text contains metaphoric expression may impact on the performance on the extraction. Besides, metaphors take in different forms, which may have been contributed to the increase in detection.

比喻性的表达。文本中包含的隐喻表达可能会影响抽取的性能。此外，隐喻采取不同的形式，这可能有助于增加检测。

| title = Thumbs up? Sentiment Classification using Machine Learning Techniques

Discrepancies in writings. For the text obtained from the Internet, the discrepancies in the writing style of targeted text data involve distinct writing genres and styles

文字上的差异。对于从网络上获取的文本，目标文本数据的写作风格差异涉及到不同的写作类型和风格

| book-title = Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP)

Context-sensitive. Classification may vary based on the subjectiveness or objectiveness of previous and following sentences.

上下文相关的。分类可以根据前面和后面句子的主观性或客观性而有所不同。

| year = 2002

Time-sensitive attribute. The task is challenged by the some textual data’s time-sensitive attribute. If a group of researchers wants to confirm a piece of fact in the news, they need a longer time for cross-validation, than the news becomes outdated.

时间敏感属性。该任务受到某些文本数据的时间敏感属性的挑战。如果一群研究人员想要在新闻中证实一个事实，他们需要更长的时间，比新闻变得过时更长的交叉验证。

| pages = 79–86

Cue words with fewer usages.

提示用法较少的词。

| url = http://www.cs.cornell.edu/home/llee/papers/sentiment.home.html

Ever-growing volume. The task is also challenged by the sheer volume of textual data. The textual data's ever-growing nature makes the task overwhelmingly difficult for the researchers to complete the task on time.

不断增长的体积。这项任务还受到大量文本数据的挑战。文本数据的不断增长性使得研究人员很难按时完成任务。

}}

</ref> who applied different methods for detecting the polarity of [[product review]]s and movie reviews respectively. This work is at the document level. One can also classify a document's polarity on a multi-way scale, which was attempted by Pang<ref name = "PangLee05">

Previously, the research mainly focused on document level classification. However, classifying a document level suffers less accuracy, as an article may have diverse types of expressions involved. Researching evidence suggests a set of news articles that are expected to dominate by the objective expression, whereas the results show that it consisted of over 40% of subjective expression.

以往的研究主要集中在文档级别的分类上。然而，文档级别的分类准确性较低，因为一篇文章可能涉及不同类型的表达方式。研究证据表明，一组新闻文章被期望以客观表达为主，而研究结果表明，这组新闻文章占主观表达的40% 以上。

{{cite conference

| first1 = Bo | last1 = Pang

All these mentioned reasons can impact on the efficiency and effectiveness of subjective and objective classification. Accordingly, two bootstrapping methods were designed to learning linguistic patterns from unannotated text data. Both methods are starting with a handful of seed words and unannotated textual data.

所有这些原因都会影响主客观分类的效率和有效性。相应地，设计了两种自举方法来从未注释的文本数据中学习语言模式。两种方法都以少量种子词和未注释的文本数据开始。

| first2 = Lillian | last2 = Lee

| title = Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales

Meta-Bootstrapping by Riloff and Jones in 1999. Level One: Generate extraction patterns based on the pre-defined rules and the extracted patterns by the number of seed words each pattern holds. Leve Two: Top 5 words will be marked and add to the dictionary. Repeat.

1999年里洛夫和琼斯的 Meta-Bootstrapping。第一级: 根据预定义的规则生成提取模式，并根据每个模式所包含的种子词数量生成提取模式。第二步: 前5个单词将被标记并添加到字典中。重复。

| book-title = Proceedings of the Association for Computational Linguistics (ACL)

Basilisk (Bootstrapping Approach to SemantIc Lexicon Induction using Semantic Knowledge) by Thelen and Riloff. Step One: Generate extration patterns Step Two: Move best patterns from Pattern Pool to Candidate Word Pool. Step Three: Top 10 words will be marked and add to the dictionary. Repeat.

Basilisk ( b ootstrapping a pproach to s emantIc l exicon i duction using s emantIc k nowledge).第一步: 生成抽取模式第二步: 将最好的模式从模式池移动到候选单词池。第三步: 将前10个单词标记并添加到字典中。重复。

| year = 2005

| pages = 115–124

Overall, these algorithms highlight the need for automatic pattern recognition and extraction in subjective and objective task.

总之，这些算法突出了主客观任务中模式自动识别和提取的需要。

| url = http://www.cs.cornell.edu/home/llee/papers/pang-lee-stars.home.html

}}

Subjective and object classifier can enhance the serval applications of natural language processing. One of the classifier's primary benefits is that it popularized the practice of data-driven decision-making processes in various industries.  According to Liu, the applications of subjective and objective identification have been implemented in business, advertising, sports, and social science.

主观分类器和对象分类器可以增强自然语言处理的一些应用。分类器的主要好处之一是它使数据驱动决策过程的实践在各个行业中普及。据刘说，主观和客观识别的应用已经在商业、广告、体育和社会科学中得到了实施。

</ref> and Snyder<ref name = "SnyderBarzilay07">

{{cite conference

| first1 = Benjamin | last1 = Snyder

| first2 = Regina | last2 = Barzilay

| title = Multiple Aspect Ranking using the Good Grief Algorithm

| book-title = Proceedings of the Joint Human Language Technology/North American Chapter of the ACL Conference (HLT-NAACL)

| year = 2007

| pages = 300–307

| url = http://people.csail.mit.edu/regina/my_papers/ggranker.ps

}}

</ref> among others: Pang and Lee<ref name = "PangLee05" /> expanded the basic task of classifying a movie review as either positive or negative to predict star ratings on either a 3- or a 4-star scale, while Snyder<ref name = "SnyderBarzilay07" /> performed an in-depth analysis of restaurant reviews, predicting ratings for various aspects of the given restaurant, such as the food and atmosphere (on a five-star scale).

It refers to determining the opinions or sentiments expressed on different features or aspects of entities, e.g., of a cell phone, a digital camera, or a bank. A feature or aspect is an attribute or component of an entity, e.g., the screen of a cell phone, the service for a restaurant, or the picture quality of a camera. The advantage of feature-based sentiment analysis is the possibility to capture nuances about objects of interest. Different features can generate different sentiment responses, for example a hotel can have a convenient location, but mediocre food. This problem involves several sub-problems, e.g., identifying relevant entities, extracting their features/aspects, and determining whether an opinion expressed on each feature/aspect is positive, negative or neutral. The automatic identification of features can be performed with syntactic methods, with topic modeling, or with deep learning. More detailed discussions about this level of sentiment analysis can be found in Liu's work.

它指的是确定对实体的不同特征或方面表达的意见或感情，例如，手机、数码相机或银行。功能或方面是一个实体的属性或组成部分，例如，手机的屏幕，餐厅的服务，或照相机的图像质量。基于特征的情感分析的优势在于可以捕捉感兴趣对象的细微差别。不同的特征可以产生不同的情绪反应，例如，酒店可以有一个方便的地点，但平庸的食物。这个问题涉及几个子问题，例如，识别相关实体，提取它们的特征/方面，以及确定对每个特征/方面表达的意见是积极的、消极的还是中性的。特征的自动识别可以通过句法方法、主题建模或者深度学习来实现。关于这一层次的情感分析的更详细的讨论可以在刘的作品中找到。

First steps to bringing together various approaches—learning, lexical, knowledge-based, etc.—were taken in the 2004 [[AAAI]] Spring Symposium where linguists, computer scientists, and other interested researchers first aligned interests and proposed shared tasks and benchmark data sets for the systematic computational research on affect, appeal, subjectivity, and sentiment in text.<ref>Qu, Yan, James Shanahan, and Janyce Wiebe. "Exploring attitude and affect in text: Theories and applications." In AAAI Spring Symposium) Technical report SS-04-07. AAAI Press, Menlo Park, CA. 2004.</ref>

Even though in most statistical classification methods, the neutral class is ignored under the assumption that neutral texts lie near the boundary of the binary classifier, several researchers suggest that, as in every polarity problem, three categories must be identified. Moreover, it can be proven that specific classifiers such as the [[Maximum entropy probability distribution|Max Entropy]]<ref name = "Vryniotis13">

{{cite conference

Existing approaches to sentiment analysis can be grouped into three main categories: knowledge-based techniques, statistical methods, and hybrid approaches. Knowledge-based techniques classify text by affect categories based on the presence of unambiguous affect words such as happy, sad, afraid, and bored. Some knowledge bases not only list obvious affect words, but also assign arbitrary words a probable "affinity" to particular emotions. Statistical methods leverage elements from machine learning such as latent semantic analysis, support vector machines, "bag of words", "Pointwise Mutual Information" for Semantic Orientation, and deep learning. More sophisticated methods try to detect the holder of a sentiment (i.e., the person who maintains that affective state) and the target (i.e., the entity about which the affect is felt). To mine the opinion in context and get the feature about which the speaker has opined, the grammatical relationships of words are used. Grammatical dependency relations are obtained by deep parsing of the text. Hybrid approaches leverage both machine learning and elements from knowledge representation such as ontologies and semantic networks in order to detect semantics that are expressed in a subtle manner, e.g., through the analysis of concepts that do not explicitly convey relevant information, but which are implicitly linked to other concepts that do so.

现有的情感分析方法可以分为三大类: 基于知识的技术、统计方法和混合方法。基于知识的技术根据明确的情感词汇的出现，如高兴、悲伤、害怕和无聊，按照情感类别对文本进行分类。一些知识库不仅列出了明显的情感词汇，而且还赋予任意的词汇一种可能的特定情感的“亲和力”。统计方法利用机器学习中的元素，例如潜在语义学、支持向量机、“单词包”、语义定位的“点间互信息”和深度学习。更复杂的方法试图检测情绪的持有者(即保持情绪状态的人)和目标(即感受情绪的实体)。为了在上下文中挖掘观点，得到说话人的观点，使用了词语的语法关系。语法依存关系是通过对文本的深入分析得到的。混合方法利用机器学习和来自知识表示的元素，如本体论和语义网络，以便检测以微妙的方式表示的语义，例如，通过分析没有明确传达相关信息，但是隐含链接到这样做的其他概念的概念。

| first = Vasilis | last = Vryniotis

| title = The importance of Neutral Class in Sentiment Analysis

Open source software tools as well as range of free and paid sentiment analysis tools deploy machine learning, statistics, and natural language processing techniques to automate sentiment analysis on large collections of texts, including web pages, online news, internet discussion groups, online reviews, web blogs, and social media. Knowledge-based systems, on the other hand, make use of publicly available resources, to extract the semantic and affective information associated with natural language concepts. The system can help perform affective commonsense reasoning. Sentiment analysis can also be performed on visual content, i.e., images and videos (see Multimodal sentiment analysis). One of the first approaches in this direction is SentiBank utilizing an adjective noun pair representation of visual content. In addition, the vast majority of sentiment classification approaches rely on the bag-of-words model, which disregards context, grammar and even word order. Approaches that analyses the sentiment based on how words compose the meaning of longer phrases have shown better result, but they incur an additional annotation overhead.

开源软件工具以及一系列免费和付费的情绪分析工具利用机器学习、统计学和自然语言处理技术，对大量文本自动进行情绪分析，这些文本包括网页、在线新闻、互联网讨论组、在线评论、网络博客和社交媒体。另一方面，知识推理系统则利用公开的资源，提取与自然语言概念相关的语义和情感信息。该系统可以帮助执行情感常识推理。情感分析也可以在可视内容上执行，例如，图像和视频(请参阅 Multimodal 情感分析)。这方面的第一个方法是使用形容词名词对表示视觉内容。此外，绝大多数情感分类方法都依赖于情感分类词袋模型，它忽略了上下文、语法甚至词序。基于词语组成长短语意义的情感分析方法取得了较好的效果，但也增加了额外的注释开销。

| year = 2013

| url = http://blog.datumbox.com/the-importance-of-neutral-class-in-sentiment-analysis/

A human analysis component is required in sentiment analysis, as automated systems are not able to analyze historical tendencies of the individual commenter, or the platform and are often classified incorrectly in their expressed sentiment. Automation impacts approximately 23% of comments that are correctly classified by humans. However, humans often disagree, and it is argued that the inter-human agreement provides an upper bound that automated sentiment classifiers can eventually reach.

在情感分析中需要一个人工分析组件，因为自动化系统不能分析个人评论者或平台的历史趋势，而且在他们表达的情感中常常被错误地分类。自动化影响了大约23% 被人类正确分类的评论。然而，人们往往不同意，并认为人际协议提供了一个上限，自动情绪分类器最终可以达到。

}}

</ref> and [[Support vector machine|SVMs]]<ref name = "KoppelSchler06">

{{cite conference

| first1 = Moshe | last1 = Koppel

The accuracy of a sentiment analysis system is, in principle, how well it agrees with human judgments. This is usually measured by variant measures based on precision and recall over the two target categories of negative and positive texts. However, according to research human raters typically only agree about 80% of the time (see Inter-rater reliability). Thus, a program that achieves 70% accuracy in classifying sentiment is doing nearly as well as humans, even though such accuracy may not sound impressive. If a program were "right" 100% of the time, humans would still disagree with it about 20% of the time, since they disagree that much about any answer.

情感分析系统的准确性，原则上来说，就是它与人类判断的一致程度。这通常是衡量的不同措施的基础上的准确率召回率，超过两个目标类别的消极和积极的文本。然而，根据研究，人类评分员通常只有80% 的时间同意(参见评分员之间的可靠性)。因此，一个在分类情绪上达到70% 准确率的程序几乎和人类一样好，即使这样的准确率听起来并不令人印象深刻。如果一个程序在100% 的时间里是“正确的”，人类仍然会在20% 的时间里不同意它，因为他们对任何答案都有很大的不同意见。

| first2 = Jonathan | last2 = Schler

| title = The Importance of Neutral Examples for Learning Sentiment

On the other hand, computer systems will make very different errors than human assessors, and thus the figures are not entirely comparable. For instance, a computer system will have trouble with negations, exaggerations, jokes, or sarcasm, which typically are easy to handle for a human reader: some errors a computer system makes will seem overly naive to a human. In general, the utility for practical commercial tasks of sentiment analysis as it is defined in academic research has been called into question, mostly since the simple one-dimensional model of sentiment from negative to positive yields rather little actionable information for a client worrying about the effect of public discourse on e.g. brand or corporate reputation.

另一方面，计算机系统会犯与人类评估员非常不同的错误，因此这些数字并不完全可比。例如，计算机系统在否定、夸张、笑话或讽刺方面会遇到麻烦，而这些对于人类读者来说通常是很容易处理的: 计算机系统出现的一些错误对于人类来说会显得过于天真。一般来说，学术研究中定义的情绪分析对实际商业任务的效用受到质疑，主要是因为简单的从消极到积极的情绪单维度模型产生的可操作信息很少，客户担心公共话语对情绪分析的影响。品牌或企业声誉。

| book-title = Computational Intelligence 22

| year = 2006

To better fit market needs, evaluation of sentiment analysis has moved to more task-based measures, formulated together with representatives from PR agencies and market research professionals. The focus in e.g. the RepLab evaluation data set is less on the content of the text under consideration and more on the effect of the text in question on brand reputation.

为了更好地适应市场需求，情绪分析的评估已转向更多基于任务的措施，与公关机构和市场研究专业人士的代表共同制定。中的焦点。RepLab 评估数据集较少考虑文本的内容，而更多考虑文本对品牌声誉的影响。

| pages = 100–109

| citeseerx = 10.1.1.84.9735

Because evaluation of sentiment analysis is becoming more and more task based, each implementation needs a separate training model to get a more accurate representation of sentiment for a given data set.

由于情感分析的评价越来越多地基于任务，每个实现都需要一个单独的训练模型来更准确地表达给定数据集的情感。

}}

</ref> can benefit from the introduction of a neutral class and improve the overall accuracy of the classification. There are in principle two ways for operating with a neutral class. Either, the algorithm proceeds by first identifying the neutral language, filtering it out and then assessing the rest in terms of positive and negative sentiments, or it builds a three-way classification in one step.<ref>{{Cite journal|last1=Ribeiro|first1=Filipe Nunes|last2=Araujo|first2=Matheus|date=2010|title=A Benchmark Comparison of State-of-the-Practice Sentiment Analysis Methods|url=https://www.researchgate.net/publication/286302059|journal=Transactions on Embedded Computing Systems |volume=9 |issue=4}}</ref> This second approach often involves estimating a probability distribution over all categories (e.g. [[Naive Bayes classifier|naive Bayes]] classifiers as implemented by the [[Nltk|NLTK]]). Whether and how to use a neutral class depends on the nature of the data: if the data is clearly clustered into neutral, negative and positive language, it makes sense to filter the neutral language out and focus on the polarity between positive and negative sentiments. If, in contrast, the data are mostly neutral with small deviations towards positive and negative affect, this strategy would make it harder to clearly distinguish between the two poles.

A different method for determining sentiment is the use of a scaling system whereby words commonly associated with having a negative, neutral, or positive sentiment with them are given an associated number on a −10 to +10 scale (most negative up to most positive) or simply from 0 to a positive upper limit such as +4. This makes it possible to adjust the sentiment of a given term relative to its environment (usually on the level of the sentence). When a piece of unstructured text is analyzed using [[natural language processing]], each concept in the specified environment is given a score based on the way sentiment words relate to the concept and its associated score.<ref>{{Cite journal|last1=Taboada|first1=Maite|last2=Brooke|first2=Julian|date=2011|title=Lexicon-based methods for sentiment analysis|url=http://dl.acm.org/citation.cfm?id=2000518|journal=Computational Linguistics |volume=37 |issue=2 |pages=272–274|doi=10.1162/coli_a_00049|citeseerx=10.1.1.188.5517|s2cid=3181362}}</ref><ref>{{Cite journal|last1=Augustyniak|first1=Łukasz|last2=Szymański|first2=Piotr|last3=Kajdanowicz|first3=Tomasz|last4=Tuligłowicz|first4=Włodzimierz|date=2015-12-25|title=Comprehensive Study on Lexicon-based Ensemble Classification Sentiment Analysis|journal=Entropy|language=en|volume=18|issue=1|pages=4|doi=10.3390/e18010004|bibcode=2015Entrp..18....4A|doi-access=free}}</ref><ref>{{Cite journal|last1=Mehmood|first1=Yasir|last2=Balakrishnan|first2=Vimala|date=2020-01-01|title=An enhanced lexicon-based approach for sentiment analysis: a case study on illegal immigration|url=https://doi.org/10.1108/OIR-10-2018-0295|journal=Online Information Review|volume=44|issue=5|pages=1097–1117|doi=10.1108/OIR-10-2018-0295|issn=1468-4527}}</ref> This allows movement to a more sophisticated understanding of sentiment, because it is now possible to adjust the sentiment value of a concept relative to modifications that may surround it. Words, for example, that intensify, relax or negate the sentiment expressed by the concept can affect its score. Alternatively, texts can be given a positive and negative sentiment strength score if the goal is to determine the sentiment in a text rather than the overall polarity and strength of the text.<ref name ="SentiStrength2010">

The rise of social media such as blogs and social networks has fueled interest in sentiment analysis. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations. As businesses look to automate the process of filtering out the noise, understanding the conversations, identifying the relevant content and actioning it appropriately, many are now looking to the field of sentiment analysis. Further complicating the matter, is the rise of anonymous social media platforms such as 4chan and Reddit. If web 2.0 was all about democratizing publishing, then the next stage of the web may well be based on democratizing data mining of all the content that is getting published.

博客和社交网络等社交媒体的兴起激发了人们对情绪分析的兴趣。随着评论、评级、推荐和其他形式的网络表达的激增，网络舆论已经变成了一种虚拟货币，企业可以通过这种货币来推销自己的产品、寻找新的机会和管理自己的声誉。随着企业寻求自动化过滤噪音的过程，理解对话，识别相关内容并适当活动，许多企业现在正在寻找情绪分析领域。更复杂的是，匿名社交媒体平台的兴起，如4chan 和 Reddit。如果说 web 2.0完全是关于民主化发布，那么 web 的下一个阶段很可能是基于对所有正在发布的内容的民主化数据挖掘。

{{cite journal

| first1 = Mike

One step towards this aim is accomplished in research. Several research teams in universities around the world currently focus on understanding the dynamics of sentiment in e-communities through sentiment analysis. The CyberEmotions project, for instance, recently identified the role of negative emotions in driving social networks discussions.

实现这一目标的一个步骤就是研究。目前，世界各地的一些大学的研究团队通过情绪分析来了解电子社区中情绪的动态。例如，CyberEmotions 项目最近发现了负面情绪在推动社交网络讨论中的作用。

| last1 = Thelwall

| first2 = Kevan

The problem is that most sentiment analysis algorithms use simple terms to express sentiment about a product or service. However, cultural factors, linguistic nuances, and differing contexts make it extremely difficult to turn a string of written text into a simple pro or con sentiment. Furthermore, sentiment analysis on Twitter has also been shown to capture the public mood behind human reproduction cycles on a planetary scale, as well as other problems of public-health relevance such as adverse drug reactions.

问题是，大多数情绪分析算法使用简单的术语来表达对产品或服务的情绪。然而，文化因素、语言上的细微差别以及不同的语境使得将一串文字转换成简单的赞成或反对的情绪变得极其困难。此外，推特上的情绪分析也表明，在全球范围内，人类生殖周期背后的公众情绪，以及其他与公共健康相关的问题，如药物不良反应。

| last2 = Buckley

| first3 = Georgios

| last3 = Paltoglou

| first4 = Di

For a recommender system, sentiment analysis has been proven to be a valuable technique. A recommender system aims to predict the preference for an item of a target user. Mainstream recommender systems work on explicit data set. For example, collaborative filtering works on the rating matrix, and content-based filtering works on the meta-data of the items.

一个推荐系统以来，情绪分析已经被证明是一种有价值的技术。推荐系统的目的是预测目标用户对某个商品的偏好。主流推荐系统工作在显式数据集上。例如，协同过滤工作在评级矩阵上，基于内容的过滤工作在项目的元数据上。

| last4 = Cai

| first5 = Arvid

In many social networking services or e-commerce websites, users can provide text review, comment or feedback to the items. These user-generated text provide a rich source of user's sentiment opinions about numerous products and items. Potentially, for an item, such text can reveal both the related feature/aspects of the item and the users' sentiments on each feature. The item's feature/aspects described in the text play the same role with the meta-data in content-based filtering, but the former are more valuable for the recommender system. Since these features are broadly mentioned by users in their reviews, they can be seen as the most crucial features that can significantly influence the user's experience on the item, while the meta-data of the item (usually provided by the producers instead of consumers) may ignore features that are concerned by the users. For different items with common features, a user may give different sentiments. Also, a feature of the same item may receive different sentiments from different users. Users' sentiments on the features can be regarded as a multi-dimensional rating score, reflecting their preference on the items.

在许多社交网络服务或电子商务网站，用户可以提供文本审查，评论或反馈的项目。这些用户生成的文本提供了丰富的来源，用户对许多产品和项目的情感意见。对于一个项目，这样的文本可以显示项目的相关特性/方面以及用户对每个特性的看法。在基于内容的过滤中，文本中描述的条目的特征/方面与元数据起着同样的作用，但前者对推荐系统更有价值。由于用户在评论中广泛提到这些功能，它们可以被视为能够显著影响用户对产品的体验的最关键的功能，而产品的元数据(通常由生产者而不是消费者提供)可能忽略用户关心的功能。对于具有共同特征的不同项目，用户可能会给出不同的感受。而且，同一个项目的某个特性可能会收到不同用户的不同意见。用户对特征的感受可以看作是一个多维度的评分分值，反映了用户对特征的偏好。

| last5 = Kappas

| title = Sentiment strength detection in short informal text

Based on the feature/aspects and the sentiments extracted from the user-generated text, a hybrid recommender system can be constructed. There are two types of motivation to recommend a candidate item to a user. The first motivation is the candidate item have numerous common features with the user's preferred items, while the second motivation is that the candidate item receives a high sentiment on its features. For a preferred item, it is reasonable to believe that items with the same features will have a similar function or utility. So, these items will also likely to be preferred by the user. On the other hand, for a shared feature of two candidate items, other users may give positive sentiment to one of them while giving negative sentiment to another. Clearly, the high evaluated item should be recommended to the user. Based on these two motivations, a combination ranking score of similarity and sentiment rating can be constructed for each candidate item. Review or feedback poorly written is hardly helpful for recommender system. Besides, a review can be designed to hinder sales of a target product, thus be harmful to the recommender system even it is well written.

基于特征/方面和从用户生成的文本中提取的情感，可以构造一个混合推荐系统。向用户推荐候选商品有两种动机。第一个动机是候选项目与用户偏好项目具有许多共同特征，第二个动机是候选项目对其特征的高度评价。对于一个首选项目，有理由相信具有相同特性的项目将具有类似的功能或实用性。因此，这些项目也可能是首选的用户。另一方面，对于两个候选项目的共同特征，其他用户可能给予其中一个正面的情绪，而给予另一个负面的情绪。显然，应该向用户推荐评价较高的项目。基于这两个动机，可以为每个候选项目建立相似度和情感评分的组合排序评分。写得不好的评论或反馈对推荐系统几乎没有任何帮助。此外，审查可能被设计成阻碍目标产品的销售，因此即使它写得很好也会对推荐系统产品造成伤害。

| year = 2010

| journal = Journal of the American Society for Information Science and Technology

Researchers also found that long and short forms of user-generated text should be treated differently. An interesting result shows that short-form reviews are sometimes more helpful than long-form, because it is easier to filter out the noise in a short-form text. For the long-form text, the growing length of the text does not always bring a proportionate increase in the number of features or sentiments in the text.

研究人员还发现，用户生成的长文本和短文本应该区别对待。一个有趣的结果表明，短形式的评论有时比长形式的评论更有帮助，因为它更容易过滤掉短形式文本中的干扰。对于长篇文本，文本长度的增长并不总是带来文本中特征或情感数量的相应增加。

| volume= 61

| issue= 12

Lamba & Madhusudhan introduce a nascent way to cater the information needs of today’s library users by repackaging the results from sentiment analysis of social media platforms like Twitter and provide it as a consolidated time-based service in different formats. Further, they propose a new way of conducting marketing in libraries using social media mining and sentiment analysis.

Lamba & Madhusudhan 介绍了一种新的方法来满足当今图书馆用户的信息需求，方法是将 Twitter 等社交媒体平台的情绪分析结果重新打包，以不同的格式提供综合的基于时间的服务。此外，他们还提出了一种利用社会媒体挖掘和情感分析在图书馆进行营销的新方法。

| pages= 2544–2558

| url = http://www.scit.wlv.ac.uk/~cm1993/papers/SentiStrengthPreprint.doc

| doi=10.1002/asi.21416

| citeseerx = 10.1.1.278.3863

}}

</ref>

There are various other types of sentiment analysis like- Aspect Based sentiment analysis, Grading sentiment analysis (positive,negative,neutral), Multilingual sentiment analysis and detection of emotions.

=== Subjectivity/objectivity identification ===

This task is commonly defined as classifying a given text (usually a sentence) into one of two classes: objective or subjective.<ref name="PangLee08Subjectivity">{{cite book

| first1 = Bo

| last1 = Pang

Category:Natural language processing

类别: 自然语言处理

| first2 = Lillian

Category:Affective computing

分类: 情感计算

| last2 = Lee

Category:Social media

分类: 社交媒体

| title = Opinion Mining and Sentiment Analysis

Category:Polling

类别: 投票

<noinclude>

This page was moved from [[wikipedia:en:Sentiment analysis]]. Its edit history can be viewed at [[情感分析/edithistory]]</noinclude>

[[Category:待整理页面]]

Moonscar

1,569

个编辑

更改

情感分析 (查看源代码)

2021年7月19日 (一) 15:19的版本