更改

情感分析 (查看源代码)

2021年7月20日 (二) 17:37的版本

删除68字节、 2021年7月20日 (二) 17:37

小

Moved page from wikipedia:en:Sentiment analysis (history)

第20行：第20行：

=== Simple cases ===

−

= = = 简单案例 = =

+

= = = 简单案例 = =

* Coronet has the best lines of all day cruisers.

第42行：第42行：

=== More challenging examples ===

−

= = 更具挑战性的例子 = =

+

= = 更具挑战性的例子 = =

* I do not dislike cabin cruisers. ([[Negation]] handling)

第81行：第81行：

== Types ==

−

= = = =

+

= = = =

A basic task in sentiment analysis is classifying the ''polarity'' of a given text at the document, sentence, or feature/aspect level—whether the expressed opinion in a document, a sentence or an entity feature/aspect is positive, negative, or neutral. Advanced, "beyond polarity" sentiment classification looks, for instance, at emotional states such as enjoyment, anger, disgust, sadness, fear, and surprise. <ref> Vong Anh Ho, Duong Huynh-Cong Nguyen, Danh Hoang Nguyen, Linh Thi-Van Pham, Duc-Vu Nguyen, Kiet Van Nguyen, Ngan Luu-Thuy Nguyen. "Emotion Recognition

第284行：第284行：

showed that removing objective sentences from a document before classifying its polarity helped improve performance.

−

~~这项任务通常被定义为将给定的文本~~(通常是一个句子)~~分为两类~~: 客观的或主观的。这个问题有时比极性分类更难解决。词汇和短语的主观性可能取决于它们的上下文，客观文件可能包含主观句子(例如，一篇引用人们观点的新闻文章)。此外，正如苏所提到的，结果在很大程度上依赖于注释文本时使用的主观性的定义。然而，彭日成表示，在对文件进行分类之前，去掉文件中的客观句子有助于提高表现。

+

这个任务通常被定义为将一个给定的文本(通常是一个句子)分成两类: 客观的或主观的。这个问题有时比极性分类更难解决。词汇和短语的主观性可能取决于它们的上下文，客观文件可能包含主观句子(例如，一篇引用人们观点的新闻文章)。此外，正如苏所提到的，结果在很大程度上依赖于注释文本时使用的主观性的定义。然而，彭日成表示，在对文件进行分类之前，去掉文件中的客观句子有助于提高表现。

{{clarify-span|Subjective and objective identification, emerging subtasks of sentiment analysis to use syntactic, semantic features, and machine learning knowledge to identify a sentence or document are facts or opinions. Awareness of recognizing factual and opinions is not recent, having possibly first presented by Carbonell at Yale University in 1979.|date=December 2020}}

第469行：第469行： −

它指的是确定表达在实体的不同特征或方面的意见或情绪，例如手机、数码相机或银行。功能或方面是一个实体的属性或组成部分，例如，手机的屏幕，餐厅的服务，或照相机的图像质量。基于特征的情感分析的优势在于可以捕捉感兴趣对象的细微差别。不同的特征可以产生不同的情绪反应，例如，酒店可以有一个方便的地点，但平庸的食物。这个问题涉及几个子问题，例如，识别相关实体，提取它们的特征/方面，以及确定对每个特征/方面表达的意见是积极的、消极的还是中性的。特征的自动识别可以通过句法方法、主题建模或者深度学习来实现。关于这一层次的情感分析的更详细的讨论可以在刘的作品中找到。

+

它指的是确定对实体的不同特征或方面表达的意见或感情，例如，手机、数码相机或银行。功能或方面是一个实体的属性或组成部分，例如，手机的屏幕，餐厅的服务，或照相机的图像质量。基于特征的情感分析的优势在于可以捕捉感兴趣对象的细微差别。不同的特征可以产生不同的情绪反应，例如，酒店可以有一个方便的地点，但平庸的食物。这个问题涉及几个子问题，例如，识别相关实体，提取它们的特征/方面，以及确定对每个特征/方面表达的意见是积极的、消极的还是中性的。特征的自动识别可以通过句法方法、主题建模或者深度学习来实现。关于这一层次的情感分析的更详细的讨论可以在刘的作品中找到。

== Methods and features==

第475行：第475行：

== Methods and features==

−

= = 方法和特征 = =

+

= = 方法和特征 = =

Existing approaches to sentiment analysis can be grouped into three main categories: knowledge-based techniques, statistical methods, and hybrid approaches.<ref name ="“Cambria">

第634行：第634行：

== Evaluation ==

−

= = 评估 = =

+

= = 评估 = =

The accuracy of a sentiment analysis system is, in principle, how well it agrees with human judgments. This is usually measured by variant measures based on [[precision and recall]] over the two target categories of negative and positive texts. However, according to research human raters typically only agree about 80%<ref>

第708行：第708行：

The rise of social media such as blogs and social networks has fueled interest in sentiment analysis. With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations. As businesses look to automate the process of filtering out the noise, understanding the conversations, identifying the relevant content and actioning it appropriately, many are now looking to the field of sentiment analysis.Wright, Alex. "Mining the Web for Feelings, Not Facts", New York Times, 2009-08-23. Retrieved on 2009-10-01. Further complicating the matter, is the rise of anonymous social media platforms such as 4chan and Reddit. If web 2.0 was all about democratizing publishing, then the next stage of the web may well be based on democratizing data mining of all the content that is getting published.Kirkpatrick, Marshall. ", ReadWriteWeb, 2009-04-15. Retrieved on 2009-10-01.

−

= = Web 2.0 = = 博客和社交网络等社交媒体的兴起，激发了人们对情绪分析的兴趣。随着评论、评级、推荐和其他形式的网络表达的激增，网络舆论已经变成了一种虚拟货币，企业可以通过这种货币来推销自己的产品、寻找新的机会和管理自己的声誉。随着企业寻求自动化过滤噪音的过程，理解对话，识别相关内容并适当活动，许多企业现在正在寻找情绪分析领域。莱特，亚历克斯。“从网上挖掘情感，而不是事实”，纽约时报，2009-08-23。2009-10-01.更复杂的是，匿名社交媒体平台的兴起，如4chan 和 Reddit。如果说 web 2.0完全是关于民主化发布，那么 web 的下一个阶段很可能是基于对所有正在发布的内容的民主化数据挖掘。马歇尔 · 柯克帕特里克。”，ReadWriteWeb，2009-04-15。2009-10-01.

+

博客和社交网络等社交媒体的兴起激发了人们对情绪分析的兴趣。随着评论、评级、推荐和其他形式的网络表达的激增，网络舆论已经变成了一种虚拟货币，企业可以通过这种货币来推销自己的产品、寻找新的机会和管理自己的声誉。随着企业寻求自动化过滤噪音的过程，理解对话，识别相关内容并适当活动，许多企业现在正在寻找情绪分析领域。莱特，亚历克斯。“从网上挖掘情感，而不是事实”，纽约时报，2009-08-23。2009-10-01.更复杂的是，匿名社交媒体平台的兴起，如4chan 和 Reddit。如果说 web 2.0完全是关于民主化发布，那么 web 的下一个阶段很可能是基于对所有正在发布的内容的民主化数据挖掘。马歇尔 · 柯克帕特里克。”，ReadWriteWeb，2009-04-15。2009-10-01.

One step towards this aim is accomplished in research. Several research teams in universities around the world currently focus on understanding the dynamics of sentiment in [[Virtual community|e-communities]] through sentiment analysis.<ref name="Collective emotions in cyberspace">CORDIS. [http://cordis.europa.eu/fetch?CALLER=FP7_PROJ_EN&ACTION=D&DOC=1&CAT=PROJ&QUERY=011e4ea33ef2:358b:41dc0328&RCN=89032 "Collective emotions in cyberspace (CYBEREMOTIONS)"], ''[[European Commission]]'', 2009-02-03. Retrieved on 2010-12-13.</ref> The [[CyberEmotions|CyberEmotions project]], for instance, recently identified the role of negative [[emotion]]s in driving social networks discussions.<ref name="NewSci_flaming">Condliffe, Jamie. [https://www.newscientist.com/article/dn19821-flaming-drives-online-social-networks.html "Flaming drives online social networks "], ''[[New Scientist]]'', 2010-12-07. Retrieved on 2010-12-13.</ref>

第735行：第735行：

For a recommender system, sentiment analysis has been proven to be a valuable technique. A recommender system aims to predict the preference for an item of a target user. Mainstream recommender systems work on explicit data set. For example, collaborative filtering works on the rating matrix, and content-based filtering works on the meta-data of the items.

−

= = = 在推荐系统中的应用 = = = 一个推荐系统以来，情绪分析已被证明是一种有价值的技术。推荐系统的目的是预测目标用户对某个商品的偏好。主流推荐系统工作在显式数据集上。例如，协同过滤工作在评级矩阵上，基于内容的过滤工作在项目的元数据上。

+

一个推荐系统以来，情绪分析已经被证明是一种有价值的技术。推荐系统的目的是预测目标用户对某个商品的偏好。主流推荐系统工作在显式数据集上。例如，协同过滤工作在评级矩阵上，基于内容的过滤工作在项目的元数据上。

In many [[social networking service]]s or [[e-commerce]] websites, users can provide text review, comment or feedback to the items. These user-generated text provide a rich source of user's sentiment opinions about numerous products and items. Potentially, for an item, such text can reveal both the related feature/aspects of the item and the users' sentiments on each feature.<ref>{{cite journal|url=https://pdfs.semanticscholar.org/8f1b/9b97183b8aa2caa0fb6c9563b14daabe8316.pdf|archive-url=https://web.archive.org/web/20180524004208/https://pdfs.semanticscholar.org/8f1b/9b97183b8aa2caa0fb6c9563b14daabe8316.pdf|url-status=dead|archive-date=2018-05-24|first1=Huifeng|last1=Tang|first2=Songbo|last2=Tan|first3=Xueqi|last3=Cheng|title=A survey on sentiment detection of reviews|journal=Expert Systems with Applications|volume=36|issue=7|year=2009|pages=10760–10773|doi=10.1016/j.eswa.2009.02.063|s2cid=2178380}}</ref> The item's feature/aspects described in the text play the same role with the meta-data in [[content-based filtering]], but the former are more valuable for the recommender system. Since these features are broadly mentioned by users in their reviews, they can be seen as the most crucial features that can significantly influence the user's experience on the item, while the meta-data of the item (usually provided by the producers instead of consumers) may ignore features that are concerned by the users. For different items with common features, a user may give different sentiments. Also, a feature of the same item may receive different sentiments from different users. Users' sentiments on the features can be regarded as a multi-dimensional rating score, reflecting their preference on the items.

第776行：第776行：

* Stylometry

−

~~= = = = =~~

+

* 情感识别

* 市场情绪

−

* ~~文体测量学~~

+

* 文体学

== References ==

Moonscar

管理员

1,592

个编辑