更改

跳到导航 跳到搜索
删除18,687字节 、 2021年8月25日 (三) 14:42
无编辑摘要
第202行: 第202行:  
然而,研究人员认识到在为表达方式分类制定一套固定的规则集方面存在一些挑战。规则开发中的大部分挑战源于文本信息的性质。一些研究人员已经认识到了六个挑战: 1)隐喻性的表达,2)写作中的差异,3)上下文敏感性,4)时间敏感性,5)代表性词用法较少以及6)不断增长的数量。
 
然而,研究人员认识到在为表达方式分类制定一套固定的规则集方面存在一些挑战。规则开发中的大部分挑战源于文本信息的性质。一些研究人员已经认识到了六个挑战: 1)隐喻性的表达,2)写作中的差异,3)上下文敏感性,4)时间敏感性,5)代表性词用法较少以及6)不断增长的数量。
   −
# Metaphorical expressions. The text contains metaphoric expression may impact on the performance on the extraction.<ref name=":13">{{Cite journal|last1=Wiebe|first1=Janyce|last2=Riloff|first2=Ellen|date=July 2011|title=Finding Mutual Benefit between Subjectivity Analysis and Information Extraction|url=https://ieeexplore.ieee.org/document/5959154|journal=IEEE Transactions on Affective Computing|volume=2|issue=4|pages=175–191|doi=10.1109/T-AFFC.2011.19|s2cid=16820846|issn=1949-3045}}</ref> Besides, metaphors take in different forms, which may have been contributed to the increase in detection.
+
# 隐喻性的表达:文本中包含隐喻性的表达可能会影响抽取的表现。<ref name=":13">{{Cite journal|last1=Wiebe|first1=Janyce|last2=Riloff|first2=Ellen|date=July 2011|title=Finding Mutual Benefit between Subjectivity Analysis and Information Extraction|url=https://ieeexplore.ieee.org/document/5959154|journal=IEEE Transactions on Affective Computing|volume=2|issue=4|pages=175–191|doi=10.1109/T-AFFC.2011.19|s2cid=16820846|issn=1949-3045}}</ref>此外,隐喻可能采取不同的形式,这会增加识别的难度。
# Discrepancies in writings. For the text obtained from the Internet, the discrepancies in the writing style of targeted text data involve distinct writing genres and styles
  −
# Context-sensitive. Classification may vary based on the subjectiveness or objectiveness of previous and following sentences.<ref name=":1">{{Cite journal|last1=Pang|first1=Bo|last2=Lee|first2=Lillian|date=2008-07-06|title=Opinion Mining and Sentiment Analysis|url=https://www.nowpublishers.com/article/Details/INR-011|journal=Foundations and Trends in Information Retrieval|language=en|volume=2|issue=1–2|pages=1–135|doi=10.1561/1500000011|issn=1554-0669}}</ref>
  −
# Time-sensitive attribute. The task is challenged by the some textual data’s time-sensitive attribute. If a group of researchers wants to confirm a piece of fact in the news, they need a longer time for cross-validation, than the news becomes outdated.
  −
# Cue words with fewer usages.
  −
# Ever-growing volume. The task is also challenged by the sheer volume of textual data. The textual data's ever-growing nature makes the task overwhelmingly difficult for the researchers to complete the task on time.
  −
 
  −
# 隐喻性的表达:文本中包含隐喻性的表达可能会影响抽取的表现。<ref name=":13" /> 此外,隐喻可能采取不同的形式,这会增加识别的难度。
   
# 写作中的差异:对于从互联网上获得的文本,目标文本数据的写作差异涉及不同的写作类型和风格 。
 
# 写作中的差异:对于从互联网上获得的文本,目标文本数据的写作差异涉及不同的写作类型和风格 。
# 上下文敏感性:根据前后句的主观性或客观性,分类会有所不同。<ref name=":1" />
+
# 上下文敏感性:根据前后句的主观性或客观性,分类会有所不同。<ref name=":1">{{Cite journal|last1=Pang|first1=Bo|last2=Lee|first2=Lillian|date=2008-07-06|title=Opinion Mining and Sentiment Analysis|url=https://www.nowpublishers.com/article/Details/INR-011|journal=Foundations and Trends in Information Retrieval|language=en|volume=2|issue=1–2|pages=1–135|doi=10.1561/1500000011|issn=1554-0669}}</ref>
 
# 时间敏感性:该任务受到某些文本数据的时间敏感属性的挑战。如果一群研究人员想要确认新闻中的事实,他们需要比新闻变得过时的更长的时间进行交叉验证。
 
# 时间敏感性:该任务受到某些文本数据的时间敏感属性的挑战。如果一群研究人员想要确认新闻中的事实,他们需要比新闻变得过时的更长的时间进行交叉验证。
 
# 代表性词用法较少:关键提示词使用的次数很少。
 
# 代表性词用法较少:关键提示词使用的次数很少。
第217行: 第210行:       −
Previously, the research mainly focused on document level classification. However, classifying a document level suffers less accuracy, as an article may have diverse types of expressions involved. Researching evidence suggests a set of news articles that are expected to dominate by the objective expression, whereas the results show that it consisted of over 40% of subjective expression.<ref name="Wiebe 2005 486–497" />
+
现有的研究主要集中于篇章级的分类。然而,篇章级分类的准确性常常较低。这是因为一篇文章可能涉及不同类型的表达方式。研究数据表明,一组预计以客观表达为主的新闻文章的分类结果显示,这组新闻文章的主观表达占40% 以上。<ref name="Wiebe 2005 486–497" />
   −
现有的研究主要集中于篇章级的分类。然而,篇章级分类的准确性常常较低。这是因为一篇文章可能涉及不同类型的表达方式。研究数据表明,一组预计以客观表达为主的新闻文章的分类结果显示,这组新闻文章的主观表达占40% 以上。
     −
  −
To overcome those challenges, researchers conclude that classifier efficacy depends on the precisions of patterns learner. And the learner feeds with large volumes of annotated training data outperformed those trained on less comprehensive subjective features. However, one of the main obstacles to executing this type of work is to generate a big dataset of annotated sentences manually.&nbsp;The manual annotation method has been less favored than automatic learning for three reasons:
      
为了克服这些挑战,研究人员总结认为,分类效力取决于模式学习者的精确度。而用大量的标记数据训练的学习者比那些用不太全面的主观特征训练的学习者表现得更好而且。然而,执行此类工作的主要障碍之一是需要人工手动生成一个大体量的带标记的句子数据集。与自动学习相比,人工标记的方法不那么受欢迎,原因主要有三个:
 
为了克服这些挑战,研究人员总结认为,分类效力取决于模式学习者的精确度。而用大量的标记数据训练的学习者比那些用不太全面的主观特征训练的学习者表现得更好而且。然而,执行此类工作的主要障碍之一是需要人工手动生成一个大体量的带标记的句子数据集。与自动学习相比,人工标记的方法不那么受欢迎,原因主要有三个:
  −
# Variations in comprehensions. In the manual annotation task, disagreement of whether one instance is subjective or objective may occur among annotators because of languages' ambiguity.
  −
# Human errors. Manual annotation task is a meticulous assignment, it require intense concentration to finish.
  −
# Time-consuming. Manual annotation task is an assiduious work. Riloff (1996) show that a 160 texts cost 8 hours for one annotator to finish.<ref name=":17">{{Cite journal|last=Riloff|first=Ellen|date=1996-08-01|title=An empirical study of automated dictionary construction for information extraction in three domains|url=https://dx.doi.org/10.1016%2F0004-3702%2895%2900123-9|journal=Artificial Intelligence|language=en|volume=85|issue=1|pages=101–134|doi=10.1016/0004-3702(95)00123-9|issn=0004-3702|doi-access=free}}</ref>
      
# 理解上的差异。在人工标记过程中,标记者之间会受限于语言的模糊性,从而可能出现对例子是主观还是客观的判断分歧。
 
# 理解上的差异。在人工标记过程中,标记者之间会受限于语言的模糊性,从而可能出现对例子是主观还是客观的判断分歧。
 
# 人为错误。人工标记是一项细致的工作,需要精力高度集中才能完成。
 
# 人为错误。人工标记是一项细致的工作,需要精力高度集中才能完成。
# 耗时长。人工注释是一项繁重的工作。Riloff(1996)的调查研究表明,一个标记者完成160篇文本标记需要8个小时。<ref name=":17" />
+
# 耗时长。人工注释是一项繁重的工作。Riloff(1996)的调查研究表明,一个标记者完成160篇文本标记需要8个小时。<ref name=":17">{{Cite journal|last=Riloff|first=Ellen|date=1996-08-01|title=An empirical study of automated dictionary construction for information extraction in three domains|url=https://dx.doi.org/10.1016%2F0004-3702%2895%2900123-9|journal=Artificial Intelligence|language=en|volume=85|issue=1|pages=101–134|doi=10.1016/0004-3702(95)00123-9|issn=0004-3702|doi-access=free}}</ref>
All these mentioned reasons can impact on the efficiency and effectiveness of subjective and objective classification. Accordingly, two bootstrapping methods were designed to learning linguistic patterns from unannotated text data. Both methods are starting with a handful of seed words and unannotated textual data.
  −
 
      
上面所有提到的这些原因都会影响主客观分类的效率和效果。因此,研究者设计了两种自举算法(bootstrapping methods),这两种方法的目的是从未标记的文本数据中学习语言模式。两种方法都以少量种子词和大量未标记的文本语料开始。
 
上面所有提到的这些原因都会影响主客观分类的效率和效果。因此,研究者设计了两种自举算法(bootstrapping methods),这两种方法的目的是从未标记的文本数据中学习语言模式。两种方法都以少量种子词和大量未标记的文本语料开始。
   −
# Meta-Bootstrapping by Riloff and Jones in 1999.<ref name=":18">{{Cite journal|last1=Riloff|first1=Ellen|last2=Jones|first2=Rosie|date=July 1999|title=Learning dictionaries for information extraction by multi-level bootstrapping|url=https://aaai.org/Papers/AAAI/1999/AAAI99-068.pdf|journal=AAAI '99/IAAI '99: Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence|pages=474–479}}</ref>    Level One: Generate extraction patterns based on the pre-defined rules and the extracted patterns by the number of seed words each pattern holds.  Leve Two: Top 5 words will be marked and add to the dictionary.  Repeat.
  −
# Basilisk (<u>B</u>ootstrapping <u>A</u>pproach to <u>S</u>emantIc <u>L</u>exicon <u>I</u>nduction using <u>S</u>emantic <u>K</u>nowledge) by Thelen and Riloff.<ref name=":19">{{Cite journal|last1=Thelen|first1=Michael|last2=Riloff|first2=Ellen|date=2002-07-06|title=A bootstrapping method for learning semantic lexicons using extraction pattern contexts|journal=Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10|series=EMNLP '02|volume=10|location=USA|publisher=Association for Computational Linguistics|pages=214–221|doi=10.3115/1118693.1118721|s2cid=137155|doi-access=free}}</ref>  Step One: Generate extration patterns  Step Two: Move best patterns from Pattern Pool to Candidate Word Pool.  Step Three: Top 10 words will be marked and add to the dictionary.  Repeat.
     −
# Meta-Bootstrapping(Riloff & Jones,1999)。<ref name=":18" />  第一步: 根据预定义的规则生成提取模式,并根据每个模式所包含的种子词数量生成提取模式。第二步: 将分数排名前5的单词标记并添加到语义字典中。重复上述方法。
+
# Meta-Bootstrapping(Riloff & Jones,1999)。<ref name=":18">{{Cite journal|last1=Riloff|first1=Ellen|last2=Jones|first2=Rosie|date=July 1999|title=Learning dictionaries for information extraction by multi-level bootstrapping|url=https://aaai.org/Papers/AAAI/1999/AAAI99-068.pdf|journal=AAAI '99/IAAI '99: Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence|pages=474–479}}</ref>  第一步: 根据预定义的规则生成提取模式,并根据每个模式所包含的种子词数量生成提取模式。第二步: 将分数排名前5的单词标记并添加到语义字典中。重复上述方法。
# Basilisk (Bootstrapping Approach to SemantIc Lexicon inducing using SemantIc Knowledge) (Thelen & Riloff,2002)。<ref name=":19" /> 第一步: 生成抽取模式;第二步: 将最好的模式从模式池移动到候选种子词池。第三步: 将分数排名前10的单词标记并添加到语义字典中。重复上述方法。
+
# Basilisk (Bootstrapping Approach to SemantIc Lexicon inducing using SemantIc Knowledge) (Thelen & Riloff,2002)。<ref name=":19">{{Cite journal|last1=Thelen|first1=Michael|last2=Riloff|first2=Ellen|date=2002-07-06|title=A bootstrapping method for learning semantic lexicons using extraction pattern contexts|journal=Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10|series=EMNLP '02|volume=10|location=USA|publisher=Association for Computational Linguistics|pages=214–221|doi=10.3115/1118693.1118721|s2cid=137155|doi-access=free}}</ref> 第一步: 生成抽取模式;第二步: 将最好的模式从模式池移动到候选种子词池。第三步: 将分数排名前10的单词标记并添加到语义字典中。重复上述方法。
   −
  −
  −
Overall, these algorithms highlight the need for automatic pattern recognition and extraction in subjective and objective task.
      
总体而言,这些算法突出了主观性和客观性识别任务中模式自动识别和提取的需要。
 
总体而言,这些算法突出了主观性和客观性识别任务中模式自动识别和提取的需要。
   −
Subjective and object classifier can enhance the serval applications of natural language processing. One of the classifier's primary benefits is that it popularized the practice of data-driven decision-making processes in various industries.&nbsp; According to Liu, the applications of subjective and objective identification have been implemented in business, advertising, sports, and social science.<ref name=":20">{{Cite journal|last=Liu|first=Bing|date=2012-05-23|title=Sentiment Analysis and Opinion Mining|url=https://www.morganclaypool.com/doi/abs/10.2200/S00416ED1V01Y201204HLT016|journal=Synthesis Lectures on Human Language Technologies|volume=5|issue=1|pages=1–167|doi=10.2200/S00416ED1V01Y201204HLT016|issn=1947-4040}}</ref>
     −
主观和客观分类器可以增强自然语言处理的服务应用。该分类器的主要好处之一是,它使数据驱动的决策过程在各个行业中得到普及。据Liu介绍,主观和客观识别的应用已经在商业、广告、体育和社会科学中得到了实践。<ref name=":20" />
+
主观和客观分类器可以增强自然语言处理的服务应用。该分类器的主要好处之一是,它使数据驱动的决策过程在各个行业中得到普及。据Liu介绍,主观和客观识别的应用已经在商业、广告、体育和社会科学中得到了实践。<ref name=":20">{{Cite journal|last=Liu|first=Bing|date=2012-05-23|title=Sentiment Analysis and Opinion Mining|url=https://www.morganclaypool.com/doi/abs/10.2200/S00416ED1V01Y201204HLT016|journal=Synthesis Lectures on Human Language Technologies|volume=5|issue=1|pages=1–167|doi=10.2200/S00416ED1V01Y201204HLT016|issn=1947-4040}}</ref>
   −
* Online review classification: In the business industry, the classifier helps the company better understand the feedbacks on product and reasonings behind the reviews.
  −
* Stock price prediction: In the finance industry, the classier aids the prediction model by process auxiliary information from social media and other textual information from the Internet.        Previous studies on Japanese stock price conducted by Dong et.al. indicates that model with subjective and objective module may perform better than those without this part.<ref name=":21">{{Cite journal|last1=Deng|first1=Shangkun|last2=Mitsubuchi|first2=Takashi|last3=Shioda|first3=Kei|last4=Shimada|first4=Tatsuro|last5=Sakurai|first5=Akito|date=December 2011|title=Combining Technical Analysis with Sentiment Analysis for Stock Price Prediction|url=http://dx.doi.org/10.1109/dasc.2011.138|journal=2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing|pages=800–807|publisher=IEEE|doi=10.1109/dasc.2011.138|isbn=978-1-4673-0006-3|s2cid=15262023}}</ref>
  −
* Social media analysis.
  −
* Students' feedback classification.<ref name=":22">{{Cite journal|last1=Nguyen|first1=Kiet Van|last2=Nguyen|first2=Vu Duc|last3=Nguyen|first3=Phu X.V.|last4=Truong|first4=Tham T.H.|last5=Nguyen|first5=Ngan L-T.|date=2018-10-01|title=UIT-VSFC: Vietnamese Students' Feedback Corpus for Sentiment Analysis|url=https://ieeexplore.ieee.org/document/8573337|journal=2018 10th International Conference on Knowledge and Systems Engineering (KSE)|pages=19–24|location=Vietnam|publisher=IEEE|doi=10.1109/KSE.2018.8573337|isbn=978-1-5386-6113-0}}</ref>
  −
*Document summarising: The classifier can extract target-specified comments and gathering opinions made by one particular entity.
  −
* Complex question answering. The classifier can dissect the complex questions by classing the language subject or objective and focused target. In the research Yu et al.(2003), the researcher developed a sentence and document level clustered that identity opinion pieces.<ref name=":23">{{Cite journal|last1=Yu|first1=Hong|last2=Hatzivassiloglou|first2=Vasileios|date=2003-07-11|title=Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences|journal=Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing|series=EMNLP '03|location=USA|publisher=Association for Computational Linguistics|pages=129–136|doi=10.3115/1119355.1119372|doi-access=free}}</ref>
  −
* Domain-specific applications.
  −
* Email analysis:  The subjective and objective classifier detects spam by tracing language patterns with target words.
      
* 在线评论分类:在商业行业,分类器帮助公司更好地理解产品的反馈和对评论背后逻辑的推理。
 
* 在线评论分类:在商业行业,分类器帮助公司更好地理解产品的反馈和对评论背后逻辑的推理。
* 股票价格预测:在金融行业,分类器通过处理从社会媒体获得的过程辅助信息和从互联网获得的其他文本信息来辅助预测模型。过去Dong等对日本股票价格的研究表明,带有主观和客观模块的模型可能比没有主客观模块的模型表现更好。<ref name=":21" />
+
* 股票价格预测:在金融行业,分类器通过处理从社会媒体获得的过程辅助信息和从互联网获得的其他文本信息来辅助预测模型。过去Dong等对日本股票价格的研究表明,带有主观和客观模块的模型可能比没有主客观模块的模型表现更好。<ref name=":21">{{Cite journal|last1=Deng|first1=Shangkun|last2=Mitsubuchi|first2=Takashi|last3=Shioda|first3=Kei|last4=Shimada|first4=Tatsuro|last5=Sakurai|first5=Akito|date=December 2011|title=Combining Technical Analysis with Sentiment Analysis for Stock Price Prediction|url=http://dx.doi.org/10.1109/dasc.2011.138|journal=2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing|pages=800–807|publisher=IEEE|doi=10.1109/dasc.2011.138|isbn=978-1-4673-0006-3|s2cid=15262023}}</ref>
 
* 社交媒体分析。
 
* 社交媒体分析。
* 学生意见分类。<ref name=":22" />
+
* 学生意见分类。<ref name=":22">{{Cite journal|last1=Nguyen|first1=Kiet Van|last2=Nguyen|first2=Vu Duc|last3=Nguyen|first3=Phu X.V.|last4=Truong|first4=Tham T.H.|last5=Nguyen|first5=Ngan L-T.|date=2018-10-01|title=UIT-VSFC: Vietnamese Students' Feedback Corpus for Sentiment Analysis|url=https://ieeexplore.ieee.org/document/8573337|journal=2018 10th International Conference on Knowledge and Systems Engineering (KSE)|pages=19–24|location=Vietnam|publisher=IEEE|doi=10.1109/KSE.2018.8573337|isbn=978-1-5386-6113-0}}</ref>
 
* 篇章总结: 分类器可以提取目标制定的评论,并收集一个特定实体的意见。
 
* 篇章总结: 分类器可以提取目标制定的评论,并收集一个特定实体的意见。
* 复杂问题回答:分类器可以对复杂的问题进行分类,包括语言主体、目标和重点目标。在Yu等(2003)的研究中,研究人员开发了一个句子和篇章级别的聚类用来识别意见块。<ref name=":23" />
+
* 复杂问题回答:分类器可以对复杂的问题进行分类,包括语言主体、目标和重点目标。在Yu等(2003)的研究中,研究人员开发了一个句子和篇章级别的聚类用来识别意见块。<ref name=":23">{{Cite journal|last1=Yu|first1=Hong|last2=Hatzivassiloglou|first2=Vasileios|date=2003-07-11|title=Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences|journal=Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing|series=EMNLP '03|location=USA|publisher=Association for Computational Linguistics|pages=129–136|doi=10.3115/1119355.1119372|doi-access=free}}</ref>
 
* 特定领域的应用。
 
* 特定领域的应用。
 
* 电子邮件分析: 主观和客观分类器通过追踪目标单词的语言模式来检测垃圾邮件。
 
* 电子邮件分析: 主观和客观分类器通过追踪目标单词的语言模式来检测垃圾邮件。
   −
=== '''Feature/aspect-based基于功能/属性的情感分析''' ===
+
=== '''基于功能/属性的情感分析''' ===
It refers to determining the opinions or sentiments expressed on different features or aspects of entities, e.g., of a cell phone, a digital camera, or a bank.<ref name="HuLiu04">{{cite conference
+
 
 +
一个更加优化的分析模型叫做“功能/属性为基础的情感分析(feature/aspect-based sentiment analysis)”。这是指判定针对一个实体在某一个方面或者某一功能下表现出来的意见或是情感, 实体可能是一个手机、一个数码相机或者是一个银行<ref name="HuLiu04">{{cite conference
 
  | first1 = Minqing | last1 = Hu
 
  | first1 = Minqing | last1 = Hu
 
  | first2 = Bing | last2 = Liu
 
  | first2 = Bing | last2 = Liu
第281行: 第252行:  
  | url = http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
 
  | url = http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
 
}}
 
}}
</ref> A feature or aspect is an attribute or component of an entity, e.g., the screen of a cell phone, the service for a restaurant, or the picture quality of a camera. The advantage of feature-based sentiment analysis is the possibility to capture nuances about objects of interest. Different features can generate different sentiment responses, for example a hotel can have a convenient location, but mediocre food.<ref name=":14">{{Cite journal|title = Good location, terrible food: detecting feature sentiment in user-generated reviews|journal = Social Network Analysis and Mining|date = 2013-06-22|issn = 1869-5450|pages = 1149–1163|volume = 3|issue = 4|doi = 10.1007/s13278-013-0119-7|first1 = Mario|last1 = Cataldi|first2 = Andrea|last2 = Ballatore|first3 = Ilaria|last3 = Tiddi|first4 = Marie-Aude|last4 = Aufaure|citeseerx = 10.1.1.396.9313|s2cid = 5025282}}</ref> This problem involves several sub-problems, e.g., identifying relevant entities, extracting their features/aspects, and determining whether an opinion expressed on each feature/aspect is positive, negative or neutral.<ref name="LiuHuCheng04">{{cite conference
+
</ref> 。“功能”或者“属性”是一件实体的某个属性或者组成部分,例如手机的屏幕、参观的服务或者是相机的图像质量等。不同的特征会产生不同的情感反应,比如一个酒店可能有方便的位置,但食物却很普通。<ref name=":14">{{Cite journal|title = Good location, terrible food: detecting feature sentiment in user-generated reviews|journal = Social Network Analysis and Mining|date = 2013-06-22|issn = 1869-5450|pages = 1149–1163|volume = 3|issue = 4|doi = 10.1007/s13278-013-0119-7|first1 = Mario|last1 = Cataldi|first2 = Andrea|last2 = Ballatore|first3 = Ilaria|last3 = Tiddi|first4 = Marie-Aude|last4 = Aufaure|citeseerx = 10.1.1.396.9313|s2cid = 5025282}}</ref>这个问题涉及到若干个子问题,譬如,识别相关的实体,提取它们的功能或属性,然后判断对每个特征/方面表达的意见是正面的、负面的还是中性的。<ref name="LiuHuCheng04">{{cite conference
 
  | first1 = Bing | last1 = Liu
 
  | first1 = Bing | last1 = Liu
 
  | first2 = Minqing | last2 = Hu | first3 = Junsheng | last3 = Cheng
 
  | first2 = Minqing | last2 = Hu | first3 = Junsheng | last3 = Cheng
第289行: 第260行:  
  | url = http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
 
  | url = http://www.cs.uic.edu/~liub/FBS/sentiment-analysis.html
 
}}
 
}}
</ref> The automatic identification of features can be performed with syntactic methods, with [[topic model]]ing,<ref name=":15">{{Cite book|title = Constrained LDA for Grouping Product Features in Opinion Mining|publisher = Springer Berlin Heidelberg|date = 2011-01-01|isbn = 978-3-642-20840-9|pages = 448–459|series = Lecture Notes in Computer Science|doi = 10.1007/978-3-642-20841-6_37|first1 = Zhongwu|last1 = Zhai|first2 = Bing|last2 = Liu|first3 = Hua|last3 = Xu|first4 = Peifa|last4 = Jia|editor-first = Joshua Zhexue|editor-last = Huang|editor-first2 = Longbing|editor-last2 = Cao|editor-first3 = Jaideep|editor-last3 = Srivastava|citeseerx = 10.1.1.221.5178}}</ref><ref name=":16">{{Cite book|title = Modeling Online Reviews with Multi-grain Topic Models|publisher = ACM|journal = Proceedings of the 17th International Conference on World Wide Web|date = 2008-01-01|location = New York, NY, USA|isbn = 978-1-60558-085-2|pages = 111–120|series = WWW '08|doi = 10.1145/1367497.1367513|first1 = Ivan|last1 = Titov|first2 = Ryan|last2 = McDonald|arxiv = 0801.1063|s2cid = 13609860}}</ref> or with [[deep learning]].<ref name="Poria">{{cite journal
+
</ref> 特征的自动识别可以通过语法方法、主题建模<ref name=":15">{{Cite book|title = Constrained LDA for Grouping Product Features in Opinion Mining|publisher = Springer Berlin Heidelberg|date = 2011-01-01|isbn = 978-3-642-20840-9|pages = 448–459|series = Lecture Notes in Computer Science|doi = 10.1007/978-3-642-20841-6_37|first1 = Zhongwu|last1 = Zhai|first2 = Bing|last2 = Liu|first3 = Hua|last3 = Xu|first4 = Peifa|last4 = Jia|editor-first = Joshua Zhexue|editor-last = Huang|editor-first2 = Longbing|editor-last2 = Cao|editor-first3 = Jaideep|editor-last3 = Srivastava|citeseerx = 10.1.1.221.5178}}</ref><ref name=":16">{{Cite book|title = Modeling Online Reviews with Multi-grain Topic Models|publisher = ACM|journal = Proceedings of the 17th International Conference on World Wide Web|date = 2008-01-01|location = New York, NY, USA|isbn = 978-1-60558-085-2|pages = 111–120|series = WWW '08|doi = 10.1145/1367497.1367513|first1 = Ivan|last1 = Titov|first2 = Ryan|last2 = McDonald|arxiv = 0801.1063|s2cid = 13609860}}</ref>或深度学习来实现。<ref name="Poria">{{cite journal
 
  | first = Soujanya | last = Poria | display-authors=etal
 
  | first = Soujanya | last = Poria | display-authors=etal
 
  | title = Aspect extraction for opinion mining with a deep convolutional neural network
 
  | title = Aspect extraction for opinion mining with a deep convolutional neural network
第305行: 第276行:  
  | pages= 5876–5883
 
  | pages= 5876–5883
 
  }}
 
  }}
</ref> More detailed discussions about this level of sentiment analysis can be found in Liu's work.<ref name="Liu2010">{{cite conference
+
</ref> 更多关于这个层面的情感分析的讨论可以参照NLP手册“情感分析和主观性(Sentiment Analysis and Subjectivity)”这一章。<ref name="Liu2010">{{cite conference
 
  | first = Bing | last = Liu
 
  | first = Bing | last = Liu
 
  | title = Sentiment Analysis and Subjectivity
 
  | title = Sentiment Analysis and Subjectivity
第314行: 第285行:  
</ref>
 
</ref>
   −
一个更加优化的分析模型叫做“功能/属性为基础的情感分析(feature/aspect-based sentiment analysis)”。这是指判定针对一个实体在某一个方面或者某一功能下表现出来的意见或是情感, 实体可能是一个手机、一个数码相机或者是一个银行<ref name="HuLiu04" /> 。“功能”或者“属性”是一件实体的某个属性或者组成部分,例如手机的屏幕、参观的服务或者是相机的图像质量等。不同的特征会产生不同的情感反应,比如一个酒店可能有方便的位置,但食物却很普通。<ref name=":14" />  这个问题涉及到若干个子问题,譬如,识别相关的实体,提取它们的功能或属性,然后判断对每个特征/方面表达的意见是正面的、负面的还是中性的。<ref name="LiuHuCheng04" /> 特征的自动识别可以通过语法方法、主题建模<ref name=":15" /><ref name=":16" /> 或深度学习来实现。<ref name="Poria" /><ref name="Ma" /> 更多关于这个层面的情感分析的讨论可以参照NLP手册“情感分析和主观性(Sentiment Analysis and Subjectivity)”这一章。<ref name="Liu2010" />
+
== 方法和特征 ==
   −
== Methods and features方法和特征 ==
+
现有的情感分析的方法主要可以分成三类:基于知识的技术(knowledge-based techniques)、统计方法(statistical methods)和混合方法(hybrid approaches)。<ref name="“Cambria">
 
  −
Existing approaches to sentiment analysis can be grouped into three main categories: knowledge-based techniques, statistical methods, and hybrid approaches.<ref name="“Cambria">
   
{{cite journal
 
{{cite journal
 
  | first1 = E
 
  | first1 = E
第338行: 第307行:  
  | s2cid = 12104996
 
  | s2cid = 12104996
 
  }}
 
  }}
</ref> Knowledge-based techniques classify text by affect categories based on the presence of unambiguous affect words such as happy, sad, afraid, and bored.<ref name="Ortony">
+
</ref>基于知识的技术根据明确的情感词(如快乐、悲伤、害怕和无聊)的存在对文本进行分类。<ref name="Ortony">
 
{{cite book
 
{{cite book
 
  |first1      = Andrew
 
  |first1      = Andrew
第354行: 第323行:  
  |archive-date = 2015-11-23
 
  |archive-date = 2015-11-23
 
}}
 
}}
</ref> Some knowledge bases not only list obvious affect words, but also assign arbitrary words a probable "affinity" to particular emotions.<ref name="Stevenson">
+
</ref>一些知识库不仅列出了明显的情感,而且还赋予了任意词汇与特定情感可能的“亲和性”。<ref name="Stevenson">
 
{{cite journal
 
{{cite journal
 
  | first1 = Ryan
 
  | first1 = Ryan
第374行: 第343行:  
  | doi-access = free
 
  | doi-access = free
 
  }}
 
  }}
</ref> Statistical methods leverage elements from [[machine learning]] such as [[latent semantic analysis]], [[support vector machines]], "[[bag of words]]", "[[Pointwise Mutual Information]]" for Semantic Orientation,<ref name="Turney02">
+
</ref>统计方法通过调控机器学习中的元素,比如潜在语意分析(latent semantic analysis),SVM(support vector machines),词袋(bag of words),(Pointwise Mutual Information for Semantic Orientation)<ref name="Turney02">
 
{{cite conference
 
{{cite conference
 
  | first = Peter | last = Turney
 
  | first = Peter | last = Turney
第383行: 第352行:  
  | arxiv = cs.LG/0212032
 
  | arxiv = cs.LG/0212032
 
}}
 
}}
</ref> and [[deep learning]]. More sophisticated methods try to detect the holder of a sentiment (i.e., the person who maintains that affective state) and the target (i.e., the entity about which the affect is felt).<ref name="Kim+Hovy06">
+
</ref>和深度学习(depp learning)等等。一些复杂的方法意在检测出情感持有者(比如,保持情绪状态的那个人)和情感目标(比如,让情感持有者产生情绪的实体)。<ref name="Kim+Hovy06">
 
{{cite conference
 
{{cite conference
 
  |last1    = Kim
 
  |last1    = Kim
第397行: 第366行:  
|archive-date = 2011-06-29
 
|archive-date = 2011-06-29
 
}}
 
}}
</ref> To mine the opinion in [[Context (language use)|context]] and get the feature about which the speaker has opined, the grammatical relationships of words are used. Grammatical dependency relations are obtained by deep parsing of the text.<ref name="DeyHaque08">
+
</ref> 语法依赖关系是通过对文本的深度解析得到的。<ref name="DeyHaque08">
 
{{cite conference
 
{{cite conference
 
| first1 = Lipika | last1 = Dey | first2 = S. K. Mirajul | last2 = Haque
 
| first1 = Lipika | last1 = Dey | first2 = S. K. Mirajul | last2 = Haque
第405行: 第374行:  
| url = http://portal.acm.org/citation.cfm?id=1390763&dl=GUIDE&coll=GUIDE&CFID=92244761&CFTOKEN=30578437
 
| url = http://portal.acm.org/citation.cfm?id=1390763&dl=GUIDE&coll=GUIDE&CFID=92244761&CFTOKEN=30578437
 
}}
 
}}
</ref> Hybrid approaches leverage both machine learning and elements from [[knowledge representation]] such as [[ontologies]] and [[semantic network]]s in order to detect semantics that are expressed in a subtle manner, e.g., through the analysis of concepts that do not explicitly convey relevant information, but which are implicitly linked to other concepts that do so.<ref name="“Hussain">
+
</ref>与单纯的语义技术不同的是,混合算法的思路利用了知识表达(knowledge representation)的元素,比如知识本体 (ontologies)、语意网络(semantic networks),因此这种算法也可以检测到文字间比较微妙的情感表达。例如, 通过分析一些没有明确表达相关信息的概念与明确概念的隐性的联系来获取所求信息。<ref name="“Hussain">
 
{{cite book
 
{{cite book
 
  | first1 = E
 
  | first1 = E
第417行: 第386行:  
| isbn = 9783319236544
 
| isbn = 9783319236544
 
  }}
 
  }}
</ref>
+
</ref>要想挖掘在某语境下的意见,或是获取被给予意见的某项功能,需要使用到语法之间的关系。语法之间互相的关联性经常需要通过深度解析文本来获取。
   −
现有的情感分析的方法主要可以分成三类:基于知识的技术(knowledge-based techniques)、统计方法(statistical methods)和混合方法(hybrid approaches)。<ref name="“Cambria" /> 基于知识的技术根据明确的情感词(如快乐、悲伤、害怕和无聊)的存在对文本进行分类。<ref name="Ortony" /> 一些知识库不仅列出了明显的情感,而且还赋予了任意词汇与特定情感可能的“亲和性”。<ref name="Stevenson" /> 统计方法通过调控机器学习中的元素,比如潜在语意分析(latent semantic analysis),SVM(support vector machines),词袋(bag of words),(Pointwise Mutual Information for Semantic Orientation)和深度学习(depp learning)等等。一些复杂的方法意在检测出情感持有者(比如,保持情绪状态的那个人)和情感目标(比如,让情感持有者产生情绪的实体)。<ref name="Kim+Hovy06" /> 语法依赖关系是通过对文本的深度解析得到的。<ref name="DeyHaque08" /> 与单纯的语义技术不同的是,混合算法的思路利用了知识表达(knowledge representation)的元素,比如知识本体 (ontologies)、语意网络(semantic networks),因此这种算法也可以检测到文字间比较微妙的情感表达。例如, 通过分析一些没有明确表达相关信息的概念与明确概念的隐性的联系来获取所求信息。<ref name="“Hussain" />要想挖掘在某语境下的意见,或是获取被给予意见的某项功能,需要使用到语法之间的关系。语法之间互相的关联性经常需要通过深度解析文本来获取。
+
有很多开源软件工具以及一系列免费和付费的情感分析工具利用机器学习、统计学方法和自然语言处理的技术,对大型文本语料进行情感分析, 这些大型文本语料包括网页、网络新闻、互联网在线讨论群组、网络在线评论、网络博客和社交媒介。<ref name="AkcoraBayirDemirbasFerhatosmanoglu2010">
 
  −
Open source software tools as well as range of free and paid sentiment analysis tools deploy [[machine learning]], statistics, and natural language processing techniques to automate sentiment analysis on large collections of texts, including web pages, online news, internet discussion groups, online reviews, web blogs, and social media.<ref name="AkcoraBayirDemirbasFerhatosmanoglu2010">
   
{{cite conference
 
{{cite conference
 
| first1 = Cuneyt Gurcan | last1 = Akcora | first2 = Murat Ali | last2 = Bayir | first3 = Murat | last3 = Demirbas | first4 = Hakan | last4 = Ferhatosmanoglu
 
| first1 = Cuneyt Gurcan | last1 = Akcora | first2 = Murat Ali | last2 = Bayir | first3 = Murat | last3 = Demirbas | first4 = Hakan | last4 = Ferhatosmanoglu
第429行: 第396行:  
| url = http://portal.acm.org/citation.cfm?id=1964867
 
| url = http://portal.acm.org/citation.cfm?id=1964867
 
}}
 
}}
</ref> Knowledge-based systems, on the other hand, make use of publicly available resources, to extract the semantic and affective information associated with natural language concepts. The system can help perform affective [[commonsense reasoning]].<ref name=":24">{{Cite journal|last1=Sasikala|first1=P.|last2=Mary Immaculate Sheela|first2=L.|date=December 2020|title=Sentiment analysis of online product reviews using DLMNN and future prediction of online product using IANFIS|journal=Journal of Big Data|language=en|volume=7|issue=1|pages=33|doi=10.1186/s40537-020-00308-7|issn=2196-1115|doi-access=free}}</ref> Sentiment analysis can also be performed on visual content, i.e., images and videos (see [[Multimodal sentiment analysis]]). One of the first approaches in this direction is SentiBank<ref name="Borth13">
+
</ref> 另一方面,基于知识的系统利用公开可用的资源,提取与自然语言概念相关的语义和情感信息。该系统可以帮助执行情感常识推理。<ref name=":24">{{Cite journal|last1=Sasikala|first1=P.|last2=Mary Immaculate Sheela|first2=L.|date=December 2020|title=Sentiment analysis of online product reviews using DLMNN and future prediction of online product using IANFIS|journal=Journal of Big Data|language=en|volume=7|issue=1|pages=33|doi=10.1186/s40537-020-00308-7|issn=2196-1115|doi-access=free}}</ref>此外,情感分析也可以在视觉内容层面上进行,例如多模态情感分析(multimodal sentiment analysis)中对图像和视频进行分析。这方面的第一种方法是SentiBank。<ref name="Borth13">
 
{{cite conference
 
{{cite conference
 
  | first1 = Damian | last1 = Borth
 
  | first1 = Damian | last1 = Borth
第439行: 第406行:  
  | url = https://visual-sentiment-ontology.appspot.com
 
  | url = https://visual-sentiment-ontology.appspot.com
 
}}  
 
}}  
</ref> utilizing an adjective noun pair representation of visual content. In addition, the vast majority of sentiment classification approaches rely on the bag-of-words model, which disregards context, [[grammar]] and even [[word order]]. Approaches that analyses the sentiment based on how words compose the meaning of longer phrases have shown better result,<ref name=":25">{{Cite journal|last1=Socher|first1=Richard|last2=Perelygin|first2=Alex|last3=Wu|first3=Jean Y.|last4=Chuang|first4=Jason|last5=Manning|first5=Christopher D.|last6=Ng|first6=Andrew Y.|last7=Potts|first7=Christopher|date=2013|title=Recursive deep models for semantic compositionality over a sentiment treebank|journal=In Proceedings of EMNLP|pages=1631–1642|citeseerx=10.1.1.593.7427}}</ref> but they incur an additional annotation overhead.
+
</ref> SentiBank方法利用形容词-名词对来代表视觉内容的属性。另外,绝大多数的情感分类方法都依赖于词袋模型(bag-of-words model),它忽略上下文语境、语法甚至是语序。根据词语如何构成较长短语的意义来分析情感的方法显示出了更好的效果,<ref name=":25">{{Cite journal|last1=Socher|first1=Richard|last2=Perelygin|first2=Alex|last3=Wu|first3=Jean Y.|last4=Chuang|first4=Jason|last5=Manning|first5=Christopher D.|last6=Ng|first6=Andrew Y.|last7=Potts|first7=Christopher|date=2013|title=Recursive deep models for semantic compositionality over a sentiment treebank|journal=In Proceedings of EMNLP|pages=1631–1642|citeseerx=10.1.1.593.7427}}</ref> 但它们会也会导致产生额外的标识成本。
 
  −
有很多开源软件工具以及一系列免费和付费的情感分析工具利用机器学习、统计学方法和自然语言处理的技术,对大型文本语料进行情感分析, 这些大型文本语料包括网页、网络新闻、互联网在线讨论群组、网络在线评论、网络博客和社交媒介。<ref name="AkcoraBayirDemirbasFerhatosmanoglu2010" /> 另一方面,基于知识的系统利用公开可用的资源,提取与自然语言概念相关的语义和情感信息。该系统可以帮助执行情感常识推理。<ref name=":24" /> 此外,情感分析也可以在视觉内容层面上进行,例如多模态情感分析(multimodal sentiment analysis)中对图像和视频进行分析。这方面的第一种方法是SentiBank。<ref name="Borth13" /> SentiBank方法利用形容词-名词对来代表视觉内容的属性。另外,绝大多数的情感分类方法都依赖于词袋模型(bag-of-words model),它忽略上下文语境、语法甚至是语序。根据词语如何构成较长短语的意义来分析情感的方法显示出了更好的效果,<ref name=":25" /> 但它们会也会导致产生额外的标识成本。
  −
 
     −
A human analysis component is required in sentiment analysis, as automated systems are not able to analyze historical tendencies of the individual commenter, or the platform and are often classified incorrectly in their expressed sentiment. Automation impacts approximately 23% of comments that are correctly classified by humans.<ref>{{cite web|title=Case Study: Advanced Sentiment Analysis|url=http://paragonpoll.com/sentiment-analysis-systems-case-study/|access-date=18 October 2013}}</ref> However, humans often disagree, and it is argued that the inter-human agreement provides an upper bound that automated sentiment classifiers can eventually reach.<ref>{{Cite journal|last1=Mozetič|first1=Igor|last2=Grčar|first2=Miha|last3=Smailović|first3=Jasmina|date=2016-05-05|title=Multilingual Twitter Sentiment Classification: The Role of Human Annotators|journal=PLOS ONE|volume=11|issue=5|pages=e0155036|doi=10.1371/journal.pone.0155036|issn=1932-6203|pmc=4858191|pmid=27149621|arxiv=1602.07563|bibcode=2016PLoSO..1155036M}}</ref>
     −
在情感分析中,需要有人工分析的成分。因为自动化系统无法分析评论者个人的历史倾向,也无法分析平台的历史倾向,这往往导致对表达的情感的错误分类。自动化情感分类器通常能够识别大约23% 被人类正确分类的评论。然而,人们往往不同意这种说法,并认为自动化情感分类器最终可以达到的与人类一致的判断上限。
+
在情感分析中,需要有人工分析的成分。因为自动化系统无法分析评论者个人的历史倾向,也无法分析平台的历史倾向,这往往导致对表达的情感的错误分类。自动化情感分类器通常能够识别大约23% 被人类正确分类的评论。<ref>{{cite web|title=Case Study: Advanced Sentiment Analysis|url=http://paragonpoll.com/sentiment-analysis-systems-case-study/|access-date=18 October 2013}}</ref>然而,人们往往不同意这种说法,并认为自动化情感分类器最终可以达到的与人类一致的判断上限。<ref>{{Cite journal|last1=Mozetič|first1=Igor|last2=Grčar|first2=Miha|last3=Smailović|first3=Jasmina|date=2016-05-05|title=Multilingual Twitter Sentiment Classification: The Role of Human Annotators|journal=PLOS ONE|volume=11|issue=5|pages=e0155036|doi=10.1371/journal.pone.0155036|issn=1932-6203|pmc=4858191|pmid=27149621|arxiv=1602.07563|bibcode=2016PLoSO..1155036M}}</ref>
   −
== Evaluation 评估 ==
+
== 评估 ==
   −
The accuracy of a sentiment analysis system is, in principle, how well it agrees with human judgments. This is usually measured by variant measures based on [[precision and recall]] over the two target categories of negative and positive texts. However, according to research human raters typically only agree about 80%<ref name=":26">
+
原则上来说,情感分析系统的准确性就是它与人类判断的一致性程度。这通常由基于负面和正面文本这两个目标类别识别的查准率和查全率的变量来衡量的。这通常是衡量的不同措施的基础上的准确率召回率,超过两个目标类别的消极和积极的文本。然而,根据现有研究,人类评分员之间通常只有80%<ref name=":26">
 
{{cite news
 
{{cite news
 
  | last = Ogneva | first = M.
 
  | last = Ogneva | first = M.
第456行: 第419行:  
  | url=http://mashable.com/2010/04/19/sentiment-analysis/ | publisher = Mashable
 
  | url=http://mashable.com/2010/04/19/sentiment-analysis/ | publisher = Mashable
 
  |access-date=2012-12-13}}
 
  |access-date=2012-12-13}}
</ref> of the time (see [[Inter-rater reliability]]). Thus, a program that achieves 70% accuracy in classifying sentiment is doing nearly as well as humans, even though such accuracy may not sound impressive. If a program were "right" 100% of the time, humans would still disagree with it about 20% of the time, since they disagree that much about ''any'' answer.<ref name=":27">
+
</ref>的几率是达成一致的(参见评分者之间的信度Inter-rater reliability)。因此,一个情感分类的程序如果能够达到70%的准确率,那么尽管这样的准确率这听起来还不算引人注目,但它的表现已经和人工识别的表现得几乎一样好。同时需要注意的是,因为人类本身对任何情感分类的答案都可能有很大的不同意见,如果一个程序有100%的准确率,人类仍然会有20%的可能不同意其判断的结果。<ref name=":27">
 
{{cite book
 
{{cite book
 
  | last = Roebuck | first = K.
 
  | last = Roebuck | first = K.
第465行: 第428行:  
</ref>
 
</ref>
   −
原则上来说,情感分析系统的准确性就是它与人类判断的一致性程度。这通常由基于负面和正面文本这两个目标类别识别的查准率和查全率的变量来衡量的。这通常是衡量的不同措施的基础上的准确率召回率,超过两个目标类别的消极和积极的文本。然而,根据现有研究,人类评分员之间通常只有80%<ref name=":26" /> 的几率是达成一致的(参见评分者之间的信度Inter-rater reliability)。因此,一个情感分类的程序如果能够达到70%的准确率,那么尽管这样的准确率这听起来还不算引人注目,但它的表现已经和人工识别的表现得几乎一样好。同时需要注意的是,因为人类本身对任何情感分类的答案都可能有很大的不同意见,如果一个程序有100%的准确率,人类仍然会有20%的可能不同意其判断的结果。<ref name=":27" />
+
另一方面,计算机系统会犯与人类评分者非常不同的错误,因此这些数字并不完全可比。例如,计算机系统在处理否定句、夸张句、笑话或讽刺句时会遇到困难,而这些句子对人类读者来说通常很容易处理,也就是说计算机系统所犯的一些错误在人类看来通常会显得过于幼稚。总的来说,学术研究中定义的情感分析在实际商业任务中的效用受到了质疑,主要是因为对于担心公众话语对品牌或企业声誉的影响的客户来说,从负面到正面的简单的单维度情感模型几乎没有提供什么可操作的信息。<ref name=":28">
 
  −
On the other hand, computer systems will make very different errors than human assessors, and thus the figures are not entirely comparable. For instance, a computer system will have trouble with negations, exaggerations, [[joke]]s, or sarcasm, which typically are easy to handle for a human reader: some errors a computer system makes will seem overly naive to a human. In general, the utility for practical commercial tasks of sentiment analysis as it is defined in academic research has been called into question, mostly since the simple one-dimensional model of sentiment from negative to positive yields rather little actionable information for a client worrying about the effect of public discourse on e.g. brand or corporate reputation.<ref name=":28">
   
[[Jussi Karlgren|Karlgren, Jussi]], [[Magnus Sahlgren]], Fredrik Olsson, Fredrik Espinoza, and Ola Hamfors. "Usefulness of sentiment analysis." In European Conference on Information Retrieval, pp. 426-435. Springer Berlin Heidelberg, 2012.
 
[[Jussi Karlgren|Karlgren, Jussi]], [[Magnus Sahlgren]], Fredrik Olsson, Fredrik Espinoza, and Ola Hamfors. "Usefulness of sentiment analysis." In European Conference on Information Retrieval, pp. 426-435. Springer Berlin Heidelberg, 2012.
 
</ref><ref name=":29">
 
</ref><ref name=":29">
第474行: 第435行:  
[[Jussi Karlgren|Karlgren, Jussi]]. "[http://www.diva-portal.org/smash/get/diva2:1042636/FULLTEXT01.pdf Affect, appeal, and sentiment as factors influencing interaction with multimedia information]." In Proceedings of Theseus/ImageCLEF workshop on visual information retrieval evaluation, pp. 8-11. 2009.
 
[[Jussi Karlgren|Karlgren, Jussi]]. "[http://www.diva-portal.org/smash/get/diva2:1042636/FULLTEXT01.pdf Affect, appeal, and sentiment as factors influencing interaction with multimedia information]." In Proceedings of Theseus/ImageCLEF workshop on visual information retrieval evaluation, pp. 8-11. 2009.
 
</ref>
 
</ref>
  −
另一方面,计算机系统会犯与人类评分者非常不同的错误,因此这些数字并不完全可比。例如,计算机系统在处理否定句、夸张句、笑话或讽刺句时会遇到困难,而这些句子对人类读者来说通常很容易处理,也就是说计算机系统所犯的一些错误在人类看来通常会显得过于幼稚。总的来说,学术研究中定义的情感分析在实际商业任务中的效用受到了质疑,主要是因为对于担心公众话语对品牌或企业声誉的影响的客户来说,从负面到正面的简单的单维度情感模型几乎没有提供什么可操作的信息。<ref name=":28" /><ref name=":29" /><ref name=":30" />
  −
        −
To better fit market needs, evaluation of sentiment analysis has moved to more task-based measures, formulated together with representatives from PR agencies and market research professionals. The focus in e.g. the RepLab evaluation data set is less on the content of the text under consideration and more on the effect of the text in question on [[brand image|brand reputation]].<ref name=":31">
+
为了更好地适应市场需求,情感分析的评估已转向更多基于任务的措施,这些措施是与公关机构和市场研究专业人士的代表共同制定的。例如,RepLab评估数据集中较少考虑的文本内容,而更多地关注文本对品牌声誉问题的影响。<ref name=":31">
 
Amigó, Enrique, Adolfo Corujo, Julio Gonzalo, Edgar Meij, and [[Maarten de Rijke]]. "Overview of RepLab 2012: Evaluating Online Reputation Management Systems." In CLEF (Online Working Notes/Labs/Workshop). 2012.
 
Amigó, Enrique, Adolfo Corujo, Julio Gonzalo, Edgar Meij, and [[Maarten de Rijke]]. "Overview of RepLab 2012: Evaluating Online Reputation Management Systems." In CLEF (Online Working Notes/Labs/Workshop). 2012.
 
</ref><ref name=":32">
 
</ref><ref name=":32">
第487行: 第445行:  
</ref>
 
</ref>
   −
为了更好地适应市场需求,情感分析的评估已转向更多基于任务的措施,这些措施是与公关机构和市场研究专业人士的代表共同制定的。例如,RepLab评估数据集中较少考虑的文本内容,而更多地关注文本对品牌声誉问题的影响。<ref name=":31" /><ref name=":32" /><ref name="replab2014" />
  −
  −
  −
Because evaluation of sentiment analysis is becoming more and more task based, each implementation needs a separate training model to get a more accurate representation of sentiment for a given data set.
      
由于情感分析的评估越来越多地基于特定任务,每个分类器的都需要一个单独的训练模型来实现更准确地识别给定数据集的情感表达。
 
由于情感分析的评估越来越多地基于特定任务,每个分类器的都需要一个单独的训练模型来实现更准确地识别给定数据集的情感表达。
第498行: 第452行:  
参阅:声誉管理(Reputation management)、web 2.0和web数据挖掘(web mining)
 
参阅:声誉管理(Reputation management)、web 2.0和web数据挖掘(web mining)
   −
The rise of [[social media]] such as [[blogs]] and [[social network]]s has fueled interest in sentiment analysis.  With the proliferation of reviews, ratings, recommendations and other forms of online expression, online opinion has turned into a kind of virtual currency for businesses looking to market their products, identify new opportunities and manage their reputations.  As businesses look to automate the process of filtering out the noise, understanding the conversations, identifying the relevant content and actioning it appropriately, many are now looking to the field of sentiment analysis.<ref name="Mining the Web for Feelings, Not Facts">Wright, Alex. [https://www.nytimes.com/2009/08/24/technology/internet/24emotion.html?_r=1 "Mining the Web for Feelings, Not Facts"], ''[[New York Times]]'', 2009-08-23. Retrieved on 2009-10-01.</ref> Further complicating the matter, is the rise of anonymous social media platforms such as [[4chan]] and [[Reddit]].<ref name=":33">{{cite web|title=Sentiment Analysis on Reddit|url=http://news.humanele.com/sentiment-analysis-reddit/|access-date=10 October 2014|date=2014-09-30}}</ref> If [[web 2.0]] was all about democratizing publishing, then the next stage of the web may well be based on democratizing [[data mining]] of all the content that is getting published.<ref name="The Future of Social Media Monitoring">Kirkpatrick, Marshall. [https://readwrite.com/2009/04/15/whats_next_in_social_media_monitoring/ "], ''[[ReadWriteWeb]]'', 2009-04-15. Retrieved on 2009-10-01.</ref>
     −
博客和社交网络等社交媒体的兴起激发了人们对情感分析的兴趣。随着评论、评级、推荐和其他形式的网络在线表达的激增,网络在线评论语料已经变成了一种虚拟货币,企业可以借此来推销自己的产品、寻找新的机会和管理自己的声誉。随着企业寻求将过滤噪音、理解对话、识别相关内容并采取适当行动的过程的自动化程度加深,许多企业将目光投向了情感分析领域。<ref name="Mining the Web for Feelings, Not Facts" /> 使问题进一步复杂化的是匿名社交媒体平台的崛起,如4chan和Reddit。<ref name=":33" />如果说web 2.0完全是关于民主化发布,那么web的下一个阶段很可能是基于对所有正在发布的内容的民主化数据挖掘。<ref name="The Future of Social Media Monitoring" />
+
博客和社交网络等社交媒体的兴起激发了人们对情感分析的兴趣。随着评论、评级、推荐和其他形式的网络在线表达的激增,网络在线评论语料已经变成了一种虚拟货币,企业可以借此来推销自己的产品、寻找新的机会和管理自己的声誉。随着企业寻求将过滤噪音、理解对话、识别相关内容并采取适当行动的过程的自动化程度加深,许多企业将目光投向了情感分析领域。<ref name="Mining the Web for Feelings, Not Facts">Wright, Alex. [https://www.nytimes.com/2009/08/24/technology/internet/24emotion.html?_r=1 "Mining the Web for Feelings, Not Facts"], ''[[New York Times]]'', 2009-08-23. Retrieved on 2009-10-01.</ref>使问题进一步复杂化的是匿名社交媒体平台的崛起,如4chan和Reddit。<ref name=":33">{{cite web|title=Sentiment Analysis on Reddit|url=http://news.humanele.com/sentiment-analysis-reddit/|access-date=10 October 2014|date=2014-09-30}}</ref>如果说web 2.0完全是关于民主化发布,那么web的下一个阶段很可能是基于对所有正在发布的内容的民主化数据挖掘。<ref name="The Future of Social Media Monitoring">Kirkpatrick, Marshall. [https://readwrite.com/2009/04/15/whats_next_in_social_media_monitoring/ "], ''[[ReadWriteWeb]]'', 2009-04-15. Retrieved on 2009-10-01.</ref>
      −
One step towards this aim is accomplished in research. Several research teams in universities around the world currently focus on understanding the dynamics of sentiment in [[Virtual community|e-communities]] through sentiment analysis.<ref name="Collective emotions in cyberspace">CORDIS. [http://cordis.europa.eu/fetch?CALLER=FP7_PROJ_EN&ACTION=D&DOC=1&CAT=PROJ&QUERY=011e4ea33ef2:358b:41dc0328&RCN=89032 "Collective emotions in cyberspace (CYBEREMOTIONS)"], ''[[European Commission]]'', 2009-02-03. Retrieved on 2010-12-13.</ref> The [[CyberEmotions|CyberEmotions project]], for instance, recently identified the role of negative [[emotion]]s in driving social networks discussions.<ref name="NewSci_flaming">Condliffe, Jamie. [https://www.newscientist.com/article/dn19821-flaming-drives-online-social-networks.html "Flaming drives online social networks "], ''[[New Scientist]]'', 2010-12-07. Retrieved on 2010-12-13.</ref>
+
在研究中,朝着这个目标迈出了一步。目前,世界各地大学的几个研究团队正致力于通过情感分析来了解网络社区中的情感动态。<ref name="Collective emotions in cyberspace">CORDIS. [http://cordis.europa.eu/fetch?CALLER=FP7_PROJ_EN&ACTION=D&DOC=1&CAT=PROJ&QUERY=011e4ea33ef2:358b:41dc0328&RCN=89032 "Collective emotions in cyberspace (CYBEREMOTIONS)"], ''[[European Commission]]'', 2009-02-03. Retrieved on 2010-12-13.</ref>例如,CyberEmotions项目最近发现了负面情绪在推动社交网络讨论中的作用。<ref name="NewSci_flaming">Condliffe, Jamie. [https://www.newscientist.com/article/dn19821-flaming-drives-online-social-networks.html "Flaming drives online social networks "], ''[[New Scientist]]'', 2010-12-07. Retrieved on 2010-12-13.</ref>
   −
在研究中,朝着这个目标迈出了一步。目前,世界各地大学的几个研究团队正致力于通过情感分析来了解网络社区中的情感动态。<ref name="Collective emotions in cyberspace" /> 例如,CyberEmotions项目最近发现了负面情绪在推动社交网络讨论中的作用。<ref name="NewSci_flaming" />
     −
The problem is that most sentiment analysis algorithms use simple terms to express sentiment about a product or service.  However, cultural factors, linguistic nuances, and differing contexts make it extremely difficult to turn a string of written text into a simple pro or con sentiment.<ref name="Mining the Web for Feelings, Not Facts" /> The fact that humans often disagree on the sentiment of text illustrates how big a task it is for computers to get this right.  The shorter the string of text, the harder it becomes.
+
问题是,大多数情感分析算法使用简单的术语来表达关于产品或服务的情感。然而,受到文化因素、语言上的细微差别以及不同的语境的影响,将文本字符串转换成简单的赞成或反对的情感变得极其困难。<ref name="Mining the Web for Feelings, Not Facts" />事实上,人类经常对文本的情感产生分歧,这一事实说明了计算机要做好这项工作是一项多么艰巨的任务。文本字符串越短,难度就越大。
   −
问题是,大多数情感分析算法使用简单的术语来表达关于产品或服务的情感。然而,受到文化因素、语言上的细微差别以及不同的语境的影响,将文本字符串转换成简单的赞成或反对的情感变得极其困难。事实上,人类经常对文本的情感产生分歧,这一事实说明了计算机要做好这项工作是一项多么艰巨的任务。文本字符串越短,难度就越大。
      +
尽管短文字符串可能是个问题,但对微型博客的情感分析已经表明,Twitter可以被视为一个有效的政治情感在线指标。Twitter的政治情感分析表显示它与政党和政客的政治立场非常吻合,这表明推特信息的内容合理地反映了线下的政治格局。<ref name="r25">{{cite journal|doi=10.1038/s41598-017-18262-5|pmid=29269945|pmc=5740080|title=Human Sexual Cycles are Driven by Culture and Match Collective Moods|journal=Scientific Reports|volume=7|issue=1|pages=17973|year=2017|last1=Wood|first1=Ian B.|last2=Varela|first2=Pedro L.|last3=Bollen|first3=Johan|last4=Rocha|first4=Luis M.|last5=Gonçalves-Sá|first5=Joana|bibcode=2017NatSR...717973W|arxiv=1707.03959}}</ref><ref name=":34">Tumasjan, Andranik; O.Sprenger, Timm; G.Sandner, Philipp; M.Welpe, Isabell (2010). [http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/viewFile/1441/1852 "Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment"]. "Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media"</ref>此外,Twitter上的情感分析也被证明可以捕捉到,在全球范围内人类生殖周期背后的公众情感以及其他与公共健康相关的问题(如药物不良反应)背后的公共情感。<ref name="r27">{{cite journal|doi=10.1016/j.jbi.2016.06.007|pmid=27363901|pmc=4981644|title=Analysis of the effect of sentiment analysis on extracting adverse drug reactions from tweets and forum posts|journal=Journal of Biomedical Informatics|volume=62|pages=148–158|year=2016|last1=Korkontzelos|first1=Ioannis|last2=Nikfarjam|first2=Azadeh|last3=Shardlow|first3=Matthew|last4=Sarker|first4=Abeed|last5=Ananiadou|first5=Sophia|last6=Gonzalez|first6=Graciela H.}}</ref>
      −
Even though short text strings might be a problem, sentiment analysis within [[microblogging]] has shown that [[Twitter]] can be seen as a valid online indicator of political sentiment. Tweets' political sentiment demonstrates close correspondence to parties' and politicians' political positions, indicating that the content of Twitter messages plausibly reflects the offline political landscape.<ref name=":34">Tumasjan, Andranik; O.Sprenger, Timm; G.Sandner, Philipp; M.Welpe, Isabell (2010). [http://www.aaai.org/ocs/index.php/ICWSM/ICWSM10/paper/viewFile/1441/1852 "Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment"]. "Proceedings of the Fourth International AAAI Conference on Weblogs and Social Media"</ref> Furthermore, sentiment analysis on [[Twitter]] has also been shown to capture the public mood behind human reproduction cycles on a planetary scale,<ref name="r25">{{cite journal|doi=10.1038/s41598-017-18262-5|pmid=29269945|pmc=5740080|title=Human Sexual Cycles are Driven by Culture and Match Collective Moods|journal=Scientific Reports|volume=7|issue=1|pages=17973|year=2017|last1=Wood|first1=Ian B.|last2=Varela|first2=Pedro L.|last3=Bollen|first3=Johan|last4=Rocha|first4=Luis M.|last5=Gonçalves-Sá|first5=Joana|bibcode=2017NatSR...717973W|arxiv=1707.03959}}</ref> as well as other problems of public-health relevance such as adverse drug reactions.<ref name="r27">{{cite journal|doi=10.1016/j.jbi.2016.06.007|pmid=27363901|pmc=4981644|title=Analysis of the effect of sentiment analysis on extracting adverse drug reactions from tweets and forum posts|journal=Journal of Biomedical Informatics|volume=62|pages=148–158|year=2016|last1=Korkontzelos|first1=Ioannis|last2=Nikfarjam|first2=Azadeh|last3=Shardlow|first3=Matthew|last4=Sarker|first4=Abeed|last5=Ananiadou|first5=Sophia|last6=Gonzalez|first6=Graciela H.}}</ref>
+
== 推荐系统中的应用 ==
 
  −
尽管短文字符串可能是个问题,但对微型博客的情感分析已经表明,Twitter可以被视为一个有效的政治情感在线指标。Twitter的政治情感分析表显示它与政党和政客的政治立场非常吻合,这表明推特信息的内容合理地反映了线下的政治格局。<ref name=":34" /> 此外,Twitter上的情感分析也被证明可以捕捉到,在全球范围内人类生殖周期背后的公众情感<ref name="r25" /> 以及其他与公共健康相关的问题(如药物不良反应)背后的公共情感<ref name="r27" />。
  −
 
  −
== Application in recommender systems 推荐系统中的应用 ==
   
{{See also|Recommender system}}
 
{{See also|Recommender system}}
参阅:推荐系统(Recommender system)
     −
For a [[recommender system]], sentiment analysis has been proven to be a valuable technique. A [[recommender system]] aims to predict the preference for an item of a target user. Mainstream recommender systems work on explicit data set. For example, [[collaborative filtering]] works on the rating matrix, and [[content-based filtering]] works on the [[Metadata|meta-data]] of the items.
      
对于一个推荐系统来说,情感分析已经被证明是一种有价值的技术。推荐系统的目的是预测目标用户对某个项目的偏好。<u>'''主流推荐系统是基于显性数据集工作的。例如,协同过滤(collaborative filtering)基于评分矩阵工作,基于内容的过滤(content-based filtering)基于项目元数据工作。'''</u>
 
对于一个推荐系统来说,情感分析已经被证明是一种有价值的技术。推荐系统的目的是预测目标用户对某个项目的偏好。<u>'''主流推荐系统是基于显性数据集工作的。例如,协同过滤(collaborative filtering)基于评分矩阵工作,基于内容的过滤(content-based filtering)基于项目元数据工作。'''</u>
      −
In many [[social networking service]]s or [[e-commerce]] websites, users can provide text review, comment or feedback to the items. These user-generated text provide a rich source of user's sentiment opinions about numerous products and items. Potentially, for an item, such text can reveal both the related feature/aspects of the item and the users' sentiments on each feature.<ref name=":35">{{cite journal|url=https://pdfs.semanticscholar.org/8f1b/9b97183b8aa2caa0fb6c9563b14daabe8316.pdf|archive-url=https://web.archive.org/web/20180524004208/https://pdfs.semanticscholar.org/8f1b/9b97183b8aa2caa0fb6c9563b14daabe8316.pdf|url-status=dead|archive-date=2018-05-24|first1=Huifeng|last1=Tang|first2=Songbo|last2=Tan|first3=Xueqi|last3=Cheng|title=A survey on sentiment detection of reviews|journal=Expert Systems with Applications|volume=36|issue=7|year=2009|pages=10760–10773|doi=10.1016/j.eswa.2009.02.063|s2cid=2178380}}</ref> The item's feature/aspects described in the text play the same role with the meta-data in [[content-based filtering]], but the former are more valuable for the recommender system. Since these features are broadly mentioned by users in their reviews, they can be seen as the most crucial features that can significantly influence the user's experience on the item, while the meta-data of the item (usually provided by the producers instead of consumers) may ignore features that are concerned by the users. For different items with common features, a user may give different sentiments. Also, a feature of the same item may receive different sentiments from different users. Users' sentiments on the features can be regarded as a multi-dimensional rating score, reflecting their preference on the items.
  −
  −
在许多社交网络服务或电子商务网站,用户可以对商品提供文本评论、意见或反馈。这些用户生成的文本提供了丰富的用户对众多产品和商品的情感意见。对于一个商品而言,这样的文本可以同时显示商品的相关功能/属性以及用户对每个特性的看法。<ref name=":35" /> 在基于内容的过滤中,文本中描述的商品的功能/属性与元数据起着同样的作用,但前者对推荐系统更有价值。由于用户在评论中广泛提到这些特性,它们可以被视为能够显著影响用户对产品的体验的最关键的特性,而产品的元数据(通常由生产者而不是消费者提供)则可能忽略用户关心的特性。对于具有共同特征的不同商品,用户可能会有不同的情感意见。而且,同一个商品的不同特性也可能会得到不同用户不同的情感意见。用户对特征的情感可以看作是一个多维度的评分分值,它反映了用户对商品的偏好。
  −
  −
  −
Based on the feature/aspects and the sentiments extracted from the user-generated text, a hybrid recommender system can be constructed.<ref name=":0">Jakob, Niklas, et al. "Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations." ''Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion''. ACM, 2009.</ref> There are two types of motivation to recommend a candidate item to a user. The first motivation is the candidate item have numerous common features with the user's preferred items,<ref name=":36">{{cite journal|first1=Hu|last1=Minqing|first2=Bing|last2=Liu|title=Mining opinion features in customer reviews|journal=AAAI|volume=4|issue=4|year=2004|s2cid=5724860|url=https://pdfs.semanticscholar.org/ee6c/726b55c66d4c222556cfae62a4eb69aa86b7.pdf|archive-url=https://web.archive.org/web/20180524004041/https://pdfs.semanticscholar.org/ee6c/726b55c66d4c222556cfae62a4eb69aa86b7.pdf|url-status=dead|archive-date=2018-05-24}}</ref> while the second motivation is that the candidate item receives a high sentiment on its features. For a preferred item, it is reasonable to believe that items with the same features will have a similar function or utility. So, these items will also likely to be preferred by the user. On the other hand, for a shared feature of two candidate items, other users may give positive sentiment to one of them while giving negative sentiment to another. Clearly, the high evaluated item should be recommended to the user. Based on these two motivations, a combination ranking score of similarity and sentiment rating can be constructed for each candidate item.<ref name=":0" />
     −
基于功能/属性和从用户生成的文本中提取的情感,可以构造一个混合推荐系统。<ref name=":0" /> 向用户推荐候选商品的动机有两种。第一种动力是候选商品与用户偏好商品具有许多共同特征,<ref name=":36" /> 第二种动机是候选商品在其特征上获得了高度的情感评价。对于一个偏好商品来说,有理由相信具有相同特性的商品将具有类似的功能或实用性。因此,这些商品也将有可能被用户所青睐。另一方面,对于两个候选商品的共同特征,其他用户可能给予其中一个正面的评价,而给予另一个负面的评价。显然,应该向用户推荐评价较高的商品。基于这两种动机,可以为每个候选商品建立相似度和情感评分的组合排序评分。<ref name=":0" />
+
在许多社交网络服务或电子商务网站,用户可以对商品提供文本评论、意见或反馈。这些用户生成的文本提供了丰富的用户对众多产品和商品的情感意见。对于一个商品而言,这样的文本可以同时显示商品的相关功能/属性以及用户对每个特性的看法。<ref name=":35">{{cite journal|url=https://pdfs.semanticscholar.org/8f1b/9b97183b8aa2caa0fb6c9563b14daabe8316.pdf|archive-url=https://web.archive.org/web/20180524004208/https://pdfs.semanticscholar.org/8f1b/9b97183b8aa2caa0fb6c9563b14daabe8316.pdf|url-status=dead|archive-date=2018-05-24|first1=Huifeng|last1=Tang|first2=Songbo|last2=Tan|first3=Xueqi|last3=Cheng|title=A survey on sentiment detection of reviews|journal=Expert Systems with Applications|volume=36|issue=7|year=2009|pages=10760–10773|doi=10.1016/j.eswa.2009.02.063|s2cid=2178380}}</ref>在基于内容的过滤中,文本中描述的商品的功能/属性与元数据起着同样的作用,但前者对推荐系统更有价值。由于用户在评论中广泛提到这些特性,它们可以被视为能够显著影响用户对产品的体验的最关键的特性,而产品的元数据(通常由生产者而不是消费者提供)则可能忽略用户关心的特性。对于具有共同特征的不同商品,用户可能会有不同的情感意见。而且,同一个商品的不同特性也可能会得到不同用户不同的情感意见。用户对特征的情感可以看作是一个多维度的评分分值,它反映了用户对商品的偏好。
      −
Except for the difficulty of the sentiment analysis itself, applying sentiment analysis on reviews or feedback also faces the challenge of spam and biased reviews. One direction of work is focused on evaluating the helpfulness of each review.<ref name=":37">{{cite book|first1=Yang|last1=Liu|first2=Xiangji|last2=Huang|first3=Aijun|last3=An|first4=Xiaohui|last4=Yu|chapter-url=http://www.yorku.ca/xhyu/papers/ICDM2008.pdf|chapter=Modeling and predicting the helpfulness of online reviews|year=2008|title=ICDM'08. Eighth IEEE international conference on Data mining|pages=443–452|publisher= IEEE|doi=10.1109/ICDM.2008.94|isbn=978-0-7695-3502-9|s2cid=18235238}}</ref> Review or feedback poorly written is hardly helpful for recommender system. Besides, a review can be designed to hinder sales of a target product, thus be harmful to the recommender system even it is well written.
+
基于功能/属性和从用户生成的文本中提取的情感,可以构造一个混合推荐系统。<ref name=":0">Jakob, Niklas, et al. "Beyond the stars: exploiting free-text user reviews to improve the accuracy of movie recommendations." ''Proceedings of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion''. ACM, 2009.</ref> 向用户推荐候选商品的动机有两种。第一种动力是候选商品与用户偏好商品具有许多共同特征,<ref name=":36">{{cite journal|first1=Hu|last1=Minqing|first2=Bing|last2=Liu|title=Mining opinion features in customer reviews|journal=AAAI|volume=4|issue=4|year=2004|s2cid=5724860|url=https://pdfs.semanticscholar.org/ee6c/726b55c66d4c222556cfae62a4eb69aa86b7.pdf|archive-url=https://web.archive.org/web/20180524004041/https://pdfs.semanticscholar.org/ee6c/726b55c66d4c222556cfae62a4eb69aa86b7.pdf|url-status=dead|archive-date=2018-05-24}}</ref>第二种动机是候选商品在其特征上获得了高度的情感评价。对于一个偏好商品来说,有理由相信具有相同特性的商品将具有类似的功能或实用性。因此,这些商品也将有可能被用户所青睐。另一方面,对于两个候选商品的共同特征,其他用户可能给予其中一个正面的评价,而给予另一个负面的评价。显然,应该向用户推荐评价较高的商品。基于这两种动机,可以为每个候选商品建立相似度和情感评分的组合排序评分。<ref name=":0" />
   −
除了情感分析本身的困难之外,对评论或反馈进行情感分析还面临着垃圾评论和有偏见的评论的挑战。其中一个工作方向是评估每条评论的有用性,<ref name=":37" />因为粗劣的评论或反馈对推荐系统几乎没有任何帮助。此外,评论可能被刻意设计成阻碍目标产品销售,因此即使它写得很好也会对推荐系统造成伤害。
        −
Researchers also found that long and short forms of user-generated text should be treated differently. An interesting result shows that short-form reviews are sometimes more helpful than long-form,<ref name=":38">{{cite book|doi=10.1145/1871437.1871741|last1=Bermingham|first1=Adam|last2=Smeaton|first2=Alan F.|title=Classifying sentiment in microblogs: is brevity an advantage?|journal=Proceedings of the 19th ACM International Conference on Information and Knowledge Management|pages=1833|year=2010|isbn=9781450300995|s2cid=2084603|url=http://doras.dcu.ie/15663/1/cikm1079-bermingham.pdf}}</ref> because it is easier to filter out the noise in a short-form text. For the long-form text, the growing length of the text does not always bring a proportionate increase in the number of features or sentiments in the text.
+
除了情感分析本身的困难之外,对评论或反馈进行情感分析还面临着垃圾评论和有偏见的评论的挑战。其中一个工作方向是评估每条评论的有用性,<ref name=":37">{{cite book|first1=Yang|last1=Liu|first2=Xiangji|last2=Huang|first3=Aijun|last3=An|first4=Xiaohui|last4=Yu|chapter-url=http://www.yorku.ca/xhyu/papers/ICDM2008.pdf|chapter=Modeling and predicting the helpfulness of online reviews|year=2008|title=ICDM'08. Eighth IEEE international conference on Data mining|pages=443–452|publisher= IEEE|doi=10.1109/ICDM.2008.94|isbn=978-0-7695-3502-9|s2cid=18235238}}</ref>因为粗劣的评论或反馈对推荐系统几乎没有任何帮助。此外,评论可能被刻意设计成阻碍目标产品销售,因此即使它写得很好也会对推荐系统造成伤害。
   −
研究人员还发现,应该用不同的方法处理用户生成的长文本和短文本。一个有趣的结果表明,短形式的评论有时比长形式的评论更有帮助,<ref name=":38" /> 因为它更容易过滤掉短形式文本中的干扰。对于长文本而言,文本长度的增长并不总是带来文本中特征或情感数量的相应增加。
      +
研究人员还发现,应该用不同的方法处理用户生成的长文本和短文本。一个有趣的结果表明,短形式的评论有时比长形式的评论更有帮助,<ref name=":38">{{cite book|doi=10.1145/1871437.1871741|last1=Bermingham|first1=Adam|last2=Smeaton|first2=Alan F.|title=Classifying sentiment in microblogs: is brevity an advantage?|journal=Proceedings of the 19th ACM International Conference on Information and Knowledge Management|pages=1833|year=2010|isbn=9781450300995|s2cid=2084603|url=http://doras.dcu.ie/15663/1/cikm1079-bermingham.pdf}}</ref>因为它更容易过滤掉短形式文本中的干扰。对于长文本而言,文本长度的增长并不总是带来文本中特征或情感数量的相应增加。
   −
Lamba & Madhusudhan<ref name=":39">{{cite journal |last1=Lamba |first1=Manika |last2=Madhusudhan |first2=Margam |title=Application of sentiment analysis in libraries to provide temporal information service: a case study on various facets of productivity |journal=Social Network Analysis and Mining |year=2018 |volume=8 |issue=1|pages=1–12|doi=10.1007/s13278-018-0541-y |s2cid=53047128 }}</ref> introduce a nascent way to cater the information needs of today’s library users by repackaging the results from sentiment analysis of social media platforms like Twitter and provide it as a consolidated time-based service in different formats. Further, they propose a new way of conducting marketing in libraries using social media mining and sentiment analysis.
     −
Lamba和Madhusudhan<ref name=":39" /> 介绍了一种新的方法,即通过重新打包Twitter等社交媒体平台的情感分析结果,并以不同的形式提供基于时间的综合服务,来满足当今图书馆用户的信息需求。此外,他们还提出了一种利用社交媒体挖掘和情感分析在图书馆进行营销的新方法。
+
Lamba和Madhusudhan<ref name=":39">{{cite journal |last1=Lamba |first1=Manika |last2=Madhusudhan |first2=Margam |title=Application of sentiment analysis in libraries to provide temporal information service: a case study on various facets of productivity |journal=Social Network Analysis and Mining |year=2018 |volume=8 |issue=1|pages=1–12|doi=10.1007/s13278-018-0541-y |s2cid=53047128 }}</ref>介绍了一种新的方法,即通过重新打包Twitter等社交媒体平台的情感分析结果,并以不同的形式提供基于时间的综合服务,来满足当今图书馆用户的信息需求。此外,他们还提出了一种利用社交媒体挖掘和情感分析在图书馆进行营销的新方法。
   −
==See also参阅==
+
==参阅==
* [[Emotion recognition]]
  −
* [[Market sentiment]]
  −
* [[Stylometry]]
      
* 情感识别  
 
* 情感识别  
1,068

个编辑

导航菜单