第237行: |
第237行: |
| 这种分析是一个分类的问题。<ref name=":1" /> | | 这种分析是一个分类的问题。<ref name=":1" /> |
| | | |
− | Each class's collections of words or phrase indicators are defined for to locate desirable patterns on unannotated text. For subjective expression, a different word list has been created. Lists of subjective indicators in words or phrases have been developed by multiple researchers in the linguist and natural language processing field states in Riloff et al.(2003).<ref>{{Cite journal|last1=Riloff|first1=Ellen|last2=Wiebe|first2=Janyce|date=2003-07-11|title=Learning extraction patterns for subjective expressions|journal=Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing|series=EMNLP '03|volume=10|location=USA|publisher=Association for Computational Linguistics|pages=105–112|doi=10.3115/1119355.1119369|s2cid=6541910|doi-access=free}}</ref> A dictionary of extraction rules has to be created for measuring given expressions. Over the years, in subjective detection, the features extraction progression from curating features by hands in 1999 to automated features learning in 2005.<ref>{{Cite journal|last1=Chaturvedi|first1=Iti|last2=Cambria|first2=Erik|last3=Welsch|first3=Roy E.|last4=Herrera|first4=Francisco|date=November 2018|title=Distinguishing between facts and opinions for sentiment analysis: Survey and challenges|url=https://sentic.net/subjectivity-detection.pdf|journal=Information Fusion|volume=44|pages=65–77|doi=10.1016/j.inffus.2017.12.006|via=Elsevier Science Direct|doi-access=free}}</ref> At the moment, automated learning methods can further separate into supervised and [[Unsupervised learning|unsupervised machine learning]]. Patterns extraction with machine learning process annotated and unannotated text have been explored extensively by academic researchers. | + | Each class's collections of words or phrase indicators are defined for to locate desirable patterns on unannotated text. For subjective expression, a different word list has been created. Lists of subjective indicators in words or phrases have been developed by multiple researchers in the linguist and natural language processing field states in Riloff et al.(2003).<ref name=":11">{{Cite journal|last1=Riloff|first1=Ellen|last2=Wiebe|first2=Janyce|date=2003-07-11|title=Learning extraction patterns for subjective expressions|journal=Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing|series=EMNLP '03|volume=10|location=USA|publisher=Association for Computational Linguistics|pages=105–112|doi=10.3115/1119355.1119369|s2cid=6541910|doi-access=free}}</ref> A dictionary of extraction rules has to be created for measuring given expressions. Over the years, in subjective detection, the features extraction progression from curating features by hands in 1999 to automated features learning in 2005.<ref name=":12">{{Cite journal|last1=Chaturvedi|first1=Iti|last2=Cambria|first2=Erik|last3=Welsch|first3=Roy E.|last4=Herrera|first4=Francisco|date=November 2018|title=Distinguishing between facts and opinions for sentiment analysis: Survey and challenges|url=https://sentic.net/subjectivity-detection.pdf|journal=Information Fusion|volume=44|pages=65–77|doi=10.1016/j.inffus.2017.12.006|via=Elsevier Science Direct|doi-access=free}}</ref> At the moment, automated learning methods can further separate into supervised and [[Unsupervised learning|unsupervised machine learning]]. Patterns extraction with machine learning process annotated and unannotated text have been explored extensively by academic researchers. |
| | | |
− | 每个类的单词或短语指示符的集合被定义用于在未注释的文本上定位理想的模式。对于主观表达,已经创建了一个不同的词表。单词或短语中的主观指标列表是由 Riloff 语言学家和自然语言处理领域的多名研究人员开发的。必须创建一个抽取规则字典来度量给定的表达式。多年来,在主观检测方面,从1999年的手工特征提取到2005年的自动特征学习。目前,自动化学习方法可以进一步分为监督学习和非监督式学习学习。利用机器学习过程对文本进行注释和去注释的模式提取方法已经成为学术界研究的热点。
| + | 每个类别的单词或短语指标集合都是为了在未注释的文本上找到理想的模式而定义的。对于主观表达,已经建立了一个不同的单词列表。Riloff等人(2003)指出,语言学家和自然语言处理领域的多位研究人员已经开发出了单词或短语的主观指标列表。<ref name=":11" /> 必须为测量给定的表达方式创建一个提取规则的字典是非常必要的。多年来,在主观性识别方面,从1999年的手工特征提取发展到了2005年的自动特征学习。<ref name=":12" />目前,自动学习方法可以进一步分为监督学习和无监督学习。利用机器学习对文本进行注释和去注释的模式提取方法已经成为学术界研究的热点。 |
| | | |
| However, researchers recognized several challenges in developing fixed sets of rules for expressions respectably. Much of the challenges in rule development stems from the nature of textual information. Six challenges have been recognized by several researchers: 1) metaphorical expressions, 2) discrepancies in writings, 3) context-sensitive, 4) represented words with fewer usages, 5) time-sensitive, and 6) ever-growing volume. | | However, researchers recognized several challenges in developing fixed sets of rules for expressions respectably. Much of the challenges in rule development stems from the nature of textual information. Six challenges have been recognized by several researchers: 1) metaphorical expressions, 2) discrepancies in writings, 3) context-sensitive, 4) represented words with fewer usages, 5) time-sensitive, and 6) ever-growing volume. |
| | | |
− | 然而,研究人员认识到,在为表达式制定一套固定的规则方面存在一些挑战。规则开发中的许多挑战源于文本信息的性质。一些研究人员已经认识到了六个挑战: 1)隐喻性的表达方式,2)写作中的差异,3)上下文敏感性,4)代表用法较少的单词,5)时间敏感性,以及6)不断增长的数量。
| + | 然而,研究人员认识到在为表达方式制定一套固定的规则集方面存在一些挑战。规则开发中的大部分挑战源于文本信息的性质。一些研究人员已经认识到了六个挑战: 1)隐喻性的表达,2)写作中的差异,3)上下文敏感性,4)代表性词用法较少,5)时间敏感性,以及6)不断增长的数量。 |
| | | |
| # Metaphorical expressions. The text contains metaphoric expression may impact on the performance on the extraction.<ref>{{Cite journal|last1=Wiebe|first1=Janyce|last2=Riloff|first2=Ellen|date=July 2011|title=Finding Mutual Benefit between Subjectivity Analysis and Information Extraction|url=https://ieeexplore.ieee.org/document/5959154|journal=IEEE Transactions on Affective Computing|volume=2|issue=4|pages=175–191|doi=10.1109/T-AFFC.2011.19|s2cid=16820846|issn=1949-3045}}</ref> Besides, metaphors take in different forms, which may have been contributed to the increase in detection. | | # Metaphorical expressions. The text contains metaphoric expression may impact on the performance on the extraction.<ref>{{Cite journal|last1=Wiebe|first1=Janyce|last2=Riloff|first2=Ellen|date=July 2011|title=Finding Mutual Benefit between Subjectivity Analysis and Information Extraction|url=https://ieeexplore.ieee.org/document/5959154|journal=IEEE Transactions on Affective Computing|volume=2|issue=4|pages=175–191|doi=10.1109/T-AFFC.2011.19|s2cid=16820846|issn=1949-3045}}</ref> Besides, metaphors take in different forms, which may have been contributed to the increase in detection. |
第252行: |
第252行: |
| # Ever-growing volume. The task is also challenged by the sheer volume of textual data. The textual data's ever-growing nature makes the task overwhelmingly difficult for the researchers to complete the task on time. | | # Ever-growing volume. The task is also challenged by the sheer volume of textual data. The textual data's ever-growing nature makes the task overwhelmingly difficult for the researchers to complete the task on time. |
| | | |
− | # 比喻性的表达。文本中包含的隐喻表达可能会影响抽取的性能。此外,隐喻采取不同的形式,这可能有助于增加检测。# 文字上的差异。对于从互联网上获得的文本,目标文本数据的写作风格差异涉及不同的写作类型和风格 # 上下文敏感。分类可以根据前面和后面句子的主观性或客观性而有所不同。# 时间敏感属性。该任务受到某些文本数据的时间敏感属性的挑战。如果一群研究人员想要在新闻中证实一个事实,他们需要更长的时间,比新闻变得过时更长的交叉验证。# 暗示用词较少的词语。# 不断增长的数量。这项任务还受到大量文本数据的挑战。文本数据的不断增长性使得研究人员很难按时完成任务。 | + | # 隐喻性的表达:文本中包含的隐喻表达可能会影响抽取的性能。此外,隐喻采取不同的形式,这可能有助于增加检测。# 文字上的差异。对于从互联网上获得的文本,目标文本数据的写作风格差异涉及不同的写作类型和风格 # 上下文敏感。分类可以根据前面和后面句子的主观性或客观性而有所不同。# 时间敏感属性。该任务受到某些文本数据的时间敏感属性的挑战。如果一群研究人员想要在新闻中证实一个事实,他们需要更长的时间,比新闻变得过时更长的交叉验证。# 暗示用词较少的词语。# 不断增长的数量。这项任务还受到大量文本数据的挑战。文本数据的不断增长性使得研究人员很难按时完成任务。 |
| + | # 写作中的差异 |
| + | # 上下文敏感性 |
| + | # 代表性词用法较少 |
| + | # 时间敏感性 |
| + | # 不断增长的数量 |
| | | |
| Previously, the research mainly focused on document level classification. However, classifying a document level suffers less accuracy, as an article may have diverse types of expressions involved. Researching evidence suggests a set of news articles that are expected to dominate by the objective expression, whereas the results show that it consisted of over 40% of subjective expression.<ref name="Wiebe 2005 486–497"/> | | Previously, the research mainly focused on document level classification. However, classifying a document level suffers less accuracy, as an article may have diverse types of expressions involved. Researching evidence suggests a set of news articles that are expected to dominate by the objective expression, whereas the results show that it consisted of over 40% of subjective expression.<ref name="Wiebe 2005 486–497"/> |