更改

添加435字节 、 2021年8月9日 (一) 17:17
第263行: 第263行:  
To overcome those challenges, researchers conclude that classifier efficacy depends on the precisions of patterns learner. And the learner feeds with large volumes of annotated training data outperformed those trained on less comprehensive subjective features. However, one of the main obstacles to executing this type of work is to generate a big dataset of annotated sentences manually. The manual annotation method has been less favored than automatic learning for three reasons:
 
To overcome those challenges, researchers conclude that classifier efficacy depends on the precisions of patterns learner. And the learner feeds with large volumes of annotated training data outperformed those trained on less comprehensive subjective features. However, one of the main obstacles to executing this type of work is to generate a big dataset of annotated sentences manually. The manual annotation method has been less favored than automatic learning for three reasons:
   −
为了克服这些挑战,研究人员总结认为,分类效力取决于模式学习者的精确度。而用大量的注释数据训练的学习者比那些用不太全面的主观特征训练的学习者表现得更好而且。然而,执行此类工作的主要障碍之一是需要手动生成一个大体量的带注释的句子数据集。与自动学习相比,手动注释的方法不那么受欢迎,原因主要有三个:
+
为了克服这些挑战,研究人员总结认为,分类效力取决于模式学习者的精确度。而用大量的标记数据训练的学习者比那些用不太全面的主观特征训练的学习者表现得更好而且。然而,执行此类工作的主要障碍之一是需要人工手动生成一个大体量的带标记的句子数据集。与自动学习相比,人工标记的方法不那么受欢迎,原因主要有三个:
    
# Variations in comprehensions. In the manual annotation task, disagreement of whether one instance is subjective or objective may occur among annotators because of languages' ambiguity.  
 
# Variations in comprehensions. In the manual annotation task, disagreement of whether one instance is subjective or objective may occur among annotators because of languages' ambiguity.  
 
# Human errors. Manual annotation task is a meticulous assignment, it require intense concentration to finish.
 
# Human errors. Manual annotation task is a meticulous assignment, it require intense concentration to finish.
# Time-consuming. Manual annotation task is an assiduious work. Riloff (1996) show that a 160 texts cost 8 hours for one annotator to finish.<ref>{{Cite journal|last=Riloff|first=Ellen|date=1996-08-01|title=An empirical study of automated dictionary construction for information extraction in three domains|url=https://dx.doi.org/10.1016%2F0004-3702%2895%2900123-9|journal=Artificial Intelligence|language=en|volume=85|issue=1|pages=101–134|doi=10.1016/0004-3702(95)00123-9|issn=0004-3702|doi-access=free}}</ref>
+
# Time-consuming. Manual annotation task is an assiduious work. Riloff (1996) show that a 160 texts cost 8 hours for one annotator to finish.<ref name=":17">{{Cite journal|last=Riloff|first=Ellen|date=1996-08-01|title=An empirical study of automated dictionary construction for information extraction in three domains|url=https://dx.doi.org/10.1016%2F0004-3702%2895%2900123-9|journal=Artificial Intelligence|language=en|volume=85|issue=1|pages=101–134|doi=10.1016/0004-3702(95)00123-9|issn=0004-3702|doi-access=free}}</ref>
   −
# 理解上的变化。在手工注释过程中,由于语言的模糊性,注释者之间可能会出现主观或客观实例的分歧。
+
# 理解上的差异。在人工标记过程中,标记者之间会受限于语言的模糊性,从而可能出现对例子是主观还是客观的判断分歧。
# 人为错误。手工注释是一项细致的工作,需要高度集中精力才能完成。
+
# 人为错误。人工标记是一项细致的工作,需要精力高度集中才能完成。
# 费时。手工注释是一项繁重的工作。里洛夫(1996)表明,一个注释者完成160篇文本需要8个小时.
+
# 耗时长。人工注释是一项繁重的工作。Riloff(1996)的调查研究表明,一个标记者完成160篇文本标记需要8个小时。<ref name=":17" />
 
All these mentioned reasons can impact on the efficiency and effectiveness of subjective and objective classification. Accordingly, two bootstrapping methods were designed to learning linguistic patterns from unannotated text data.  Both methods are starting with a handful of seed words and unannotated textual data.
 
All these mentioned reasons can impact on the efficiency and effectiveness of subjective and objective classification. Accordingly, two bootstrapping methods were designed to learning linguistic patterns from unannotated text data.  Both methods are starting with a handful of seed words and unannotated textual data.
   −
所有这些原因都会影响主客观分类的效率和有效性。相应地,设计了两种自举方法来从未注释的文本数据中学习语言模式。两种方法都以少量种子词和未注释的文本数据开始。
+
上面所有提到的这些原因都会影响主客观分类的效率和效果。因此,研究者设计了两种自举算法(bootstrapping methods),这两种方法的目的是从未标记的文本数据中学习语言模式。两种方法都以少量种子词和大量未标记的文本语料开始。
   −
# Meta-Bootstrapping by Riloff and Jones in 1999.<ref>{{Cite journal|last1=Riloff|first1=Ellen|last2=Jones|first2=Rosie|date=July 1999|title=Learning dictionaries for information extraction by multi-level bootstrapping|url=https://aaai.org/Papers/AAAI/1999/AAAI99-068.pdf|journal=AAAI '99/IAAI '99: Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence|pages=474–479}}</ref>    Level One: Generate extraction patterns based on the pre-defined rules and the extracted patterns by the number of seed words each pattern holds.  Leve Two: Top 5 words will be marked and add to the dictionary.  Repeat.
+
# Meta-Bootstrapping by Riloff and Jones in 1999.<ref name=":18">{{Cite journal|last1=Riloff|first1=Ellen|last2=Jones|first2=Rosie|date=July 1999|title=Learning dictionaries for information extraction by multi-level bootstrapping|url=https://aaai.org/Papers/AAAI/1999/AAAI99-068.pdf|journal=AAAI '99/IAAI '99: Proceedings of the Sixteenth National Conference on Artificial Intelligence and the Eleventh Innovative Applications of Artificial Intelligence Conference Innovative Applications of Artificial Intelligence|pages=474–479}}</ref>    Level One: Generate extraction patterns based on the pre-defined rules and the extracted patterns by the number of seed words each pattern holds.  Leve Two: Top 5 words will be marked and add to the dictionary.  Repeat.
# Basilisk (<u>B</u>ootstrapping <u>A</u>pproach to <u>S</u>emantIc <u>L</u>exicon <u>I</u>nduction using <u>S</u>emantic <u>K</u>nowledge) by Thelen and Riloff.<ref>{{Cite journal|last1=Thelen|first1=Michael|last2=Riloff|first2=Ellen|date=2002-07-06|title=A bootstrapping method for learning semantic lexicons using extraction pattern contexts|journal=Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10|series=EMNLP '02|volume=10|location=USA|publisher=Association for Computational Linguistics|pages=214–221|doi=10.3115/1118693.1118721|s2cid=137155|doi-access=free}}</ref>  Step One: Generate extration patterns  Step Two: Move best patterns from Pattern Pool to Candidate Word Pool.  Step Three: Top 10 words will be marked and add to the dictionary.  Repeat.
+
# Basilisk (<u>B</u>ootstrapping <u>A</u>pproach to <u>S</u>emantIc <u>L</u>exicon <u>I</u>nduction using <u>S</u>emantic <u>K</u>nowledge) by Thelen and Riloff.<ref name=":19">{{Cite journal|last1=Thelen|first1=Michael|last2=Riloff|first2=Ellen|date=2002-07-06|title=A bootstrapping method for learning semantic lexicons using extraction pattern contexts|journal=Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing - Volume 10|series=EMNLP '02|volume=10|location=USA|publisher=Association for Computational Linguistics|pages=214–221|doi=10.3115/1118693.1118721|s2cid=137155|doi-access=free}}</ref>  Step One: Generate extration patterns  Step Two: Move best patterns from Pattern Pool to Candidate Word Pool.  Step Three: Top 10 words will be marked and add to the dictionary.  Repeat.
 
  −
# 1999年里洛夫和琼斯的 Meta-Bootstrapping。第一级: 根据预定义的规则生成提取模式,并根据每个模式所包含的种子词数量生成提取模式。第二步: 前5个单词将被标记并添加到字典中。重复。
  −
# Basilisk (Bootstrapping Approach to SemantIc Lexicon inducing using SemantIc Knowledge) Thelen and Riloff.第一步: 生成抽取模式第二步: 将最好的模式从模式池移动到候选单词池。第三步: 将前10个单词标记并添加到字典中。重复。
      +
# Meta-Bootstrapping(Riloff & Jones,1999)。<ref name=":18" />  第一步: 根据预定义的规则生成提取模式,并根据每个模式所包含的种子词数量生成提取模式。第二步: 将分数排名前5的单词标记并添加到语义字典中。重复上述方法。
 +
# Basilisk (Bootstrapping Approach to SemantIc Lexicon inducing using SemantIc Knowledge) (Thelen & Riloff,2002)。<ref name=":19" />  第一步: 生成抽取模式;第二步: 将最好的模式从模式池移动到候选种子词池。第三步: 将分数排名前10的单词标记并添加到语义字典中。重复上述方法。
      第287行: 第286行:  
Overall, these algorithms highlight the need for automatic pattern recognition and extraction in subjective and objective task.
 
Overall, these algorithms highlight the need for automatic pattern recognition and extraction in subjective and objective task.
   −
总之,这些算法突出了主客观任务中模式自动识别和提取的需要。
+
总体而言,这些算法突出了主观性和客观性识别任务中模式自动识别和提取的需要。
   −
Subjective and object classifier can enhance the serval applications of natural language processing. One of the classifier's primary benefits is that it popularized the practice of data-driven decision-making processes in various industries.&nbsp; According to Liu, the applications of subjective and objective identification have been implemented in business, advertising, sports, and social science.<ref>{{Cite journal|last=Liu|first=Bing|date=2012-05-23|title=Sentiment Analysis and Opinion Mining|url=https://www.morganclaypool.com/doi/abs/10.2200/S00416ED1V01Y201204HLT016|journal=Synthesis Lectures on Human Language Technologies|volume=5|issue=1|pages=1–167|doi=10.2200/S00416ED1V01Y201204HLT016|issn=1947-4040}}</ref>
+
Subjective and object classifier can enhance the serval applications of natural language processing. One of the classifier's primary benefits is that it popularized the practice of data-driven decision-making processes in various industries.&nbsp; According to Liu, the applications of subjective and objective identification have been implemented in business, advertising, sports, and social science.<ref name=":20">{{Cite journal|last=Liu|first=Bing|date=2012-05-23|title=Sentiment Analysis and Opinion Mining|url=https://www.morganclaypool.com/doi/abs/10.2200/S00416ED1V01Y201204HLT016|journal=Synthesis Lectures on Human Language Technologies|volume=5|issue=1|pages=1–167|doi=10.2200/S00416ED1V01Y201204HLT016|issn=1947-4040}}</ref>
   −
主观分类器和对象分类器可以增强自然语言处理的一些应用。分类器的主要好处之一是它使数据驱动决策过程的实践在各个行业中普及。据刘说,主观和客观识别的应用已经在商业、广告、体育和社会科学中得到了实施。
+
主观和客观分类器可以增强自然语言处理的服务应用。该分类器的主要好处之一是,它使数据驱动的决策过程在各个行业中得到普及。据Liu介绍,主观和客观识别的应用已经在商业、广告、体育和社会科学中得到了实践。<ref name=":20" />
    
* Online review classification: In the business industry, the classifier helps the company better understand the feedbacks on product and reasonings behind the reviews.
 
* Online review classification: In the business industry, the classifier helps the company better understand the feedbacks on product and reasonings behind the reviews.
* Stock price prediction: In the finance industry, the classier aids the prediction model by process auxiliary information from social media and other textual information from the Internet.        Previous studies on Japanese stock price conducted by Dong et.al. indicates that model with subjective and objective module may perform better than those without this part.<ref>{{Cite journal|last1=Deng|first1=Shangkun|last2=Mitsubuchi|first2=Takashi|last3=Shioda|first3=Kei|last4=Shimada|first4=Tatsuro|last5=Sakurai|first5=Akito|date=December 2011|title=Combining Technical Analysis with Sentiment Analysis for Stock Price Prediction|url=http://dx.doi.org/10.1109/dasc.2011.138|journal=2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing|pages=800–807|publisher=IEEE|doi=10.1109/dasc.2011.138|isbn=978-1-4673-0006-3|s2cid=15262023}}</ref>
+
* Stock price prediction: In the finance industry, the classier aids the prediction model by process auxiliary information from social media and other textual information from the Internet.        Previous studies on Japanese stock price conducted by Dong et.al. indicates that model with subjective and objective module may perform better than those without this part.<ref name=":21">{{Cite journal|last1=Deng|first1=Shangkun|last2=Mitsubuchi|first2=Takashi|last3=Shioda|first3=Kei|last4=Shimada|first4=Tatsuro|last5=Sakurai|first5=Akito|date=December 2011|title=Combining Technical Analysis with Sentiment Analysis for Stock Price Prediction|url=http://dx.doi.org/10.1109/dasc.2011.138|journal=2011 IEEE Ninth International Conference on Dependable, Autonomic and Secure Computing|pages=800–807|publisher=IEEE|doi=10.1109/dasc.2011.138|isbn=978-1-4673-0006-3|s2cid=15262023}}</ref>
 
* Social media analysis.
 
* Social media analysis.
* Students' feedback classification.<ref>{{Cite journal|last1=Nguyen|first1=Kiet Van|last2=Nguyen|first2=Vu Duc|last3=Nguyen|first3=Phu X.V.|last4=Truong|first4=Tham T.H.|last5=Nguyen|first5=Ngan L-T.|date=2018-10-01|title=UIT-VSFC: Vietnamese Students' Feedback Corpus for Sentiment Analysis|url=https://ieeexplore.ieee.org/document/8573337|journal=2018 10th International Conference on Knowledge and Systems Engineering (KSE)|pages=19–24|location=Vietnam|publisher=IEEE|doi=10.1109/KSE.2018.8573337|isbn=978-1-5386-6113-0}}</ref>
+
* Students' feedback classification.<ref name=":22">{{Cite journal|last1=Nguyen|first1=Kiet Van|last2=Nguyen|first2=Vu Duc|last3=Nguyen|first3=Phu X.V.|last4=Truong|first4=Tham T.H.|last5=Nguyen|first5=Ngan L-T.|date=2018-10-01|title=UIT-VSFC: Vietnamese Students' Feedback Corpus for Sentiment Analysis|url=https://ieeexplore.ieee.org/document/8573337|journal=2018 10th International Conference on Knowledge and Systems Engineering (KSE)|pages=19–24|location=Vietnam|publisher=IEEE|doi=10.1109/KSE.2018.8573337|isbn=978-1-5386-6113-0}}</ref>
 
*Document summarising: The classifier can extract target-specified comments and gathering opinions made by one particular entity.
 
*Document summarising: The classifier can extract target-specified comments and gathering opinions made by one particular entity.
* Complex question answering. The classifier can dissect the complex questions by classing the language subject or objective and focused target. In the research Yu et al.(2003), the researcher developed a sentence and document level clustered that identity opinion pieces.<ref>{{Cite journal|last1=Yu|first1=Hong|last2=Hatzivassiloglou|first2=Vasileios|date=2003-07-11|title=Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences|journal=Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing|series=EMNLP '03|location=USA|publisher=Association for Computational Linguistics|pages=129–136|doi=10.3115/1119355.1119372|doi-access=free}}</ref>
+
* Complex question answering. The classifier can dissect the complex questions by classing the language subject or objective and focused target. In the research Yu et al.(2003), the researcher developed a sentence and document level clustered that identity opinion pieces.<ref name=":23">{{Cite journal|last1=Yu|first1=Hong|last2=Hatzivassiloglou|first2=Vasileios|date=2003-07-11|title=Towards answering opinion questions: separating facts from opinions and identifying the polarity of opinion sentences|journal=Proceedings of the 2003 Conference on Empirical Methods in Natural Language Processing|series=EMNLP '03|location=USA|publisher=Association for Computational Linguistics|pages=129–136|doi=10.3115/1119355.1119372|doi-access=free}}</ref>
 
* Domain-specific applications.
 
* Domain-specific applications.
 
* Email analysis:  The subjective and objective classifier detects spam by tracing language patterns with target words.
 
* Email analysis:  The subjective and objective classifier detects spam by tracing language patterns with target words.
   −
 
+
* 在线评论分类:在商业行业,分类器帮助公司更好地理解产品的反馈和对评论背后逻辑的推理。
* 在线评论分类: 在商业行业,分类器帮助公司更好地了解产品的反馈和评论背后的原因。
+
* 股票价格预测:在金融行业,分类器通过处理从社会媒体获得的过程辅助信息和从互联网获得的其他文本信息来辅助预测模型。过去Dong等对日本股票价格的研究表明,带有主观和客观模块的模型可能比没有主客观模块的模型表现更好。<ref name=":21" />
* 股票价格预测: 在金融行业,分类器通过从社会媒体获得的过程辅助信息和从互联网获得的其他文本信息来辅助预测模型。以往对日本股票价格的研究都是由 Dong et.al 进行的。表明具有主客观模块的模型可能比没有主客观模块的模型表现更好。
   
* 社交媒体分析。
 
* 社交媒体分析。
* 学生意见分类。
+
* 学生意见分类。<ref name=":22" />
* 文件总结: 分类器可以提取目标特定的评论,并收集一个特定实体的意见。
+
* 篇章总结: 分类器可以提取目标制定的评论,并收集一个特定实体的意见。
* 复杂问题回答。量词可以通过对语言主题或客观目标进行分类来解析复杂问题。在余等人的研究中。(2003) ,研究人员开发了一个句子和文档级别的群集身份意见片。
+
* 复杂问题回答:分类器可以对复杂的问题进行分类,包括语言主体、目标和重点目标。在Yu等(2003)的研究中,研究人员开发了一个句子和篇章级别的聚类用来识别意见块。<ref name=":23" />
* 特定领域的应用程序。
+
* 特定领域的应用。
 
* 电子邮件分析: 主观和客观分类器通过追踪目标单词的语言模式来检测垃圾邮件。
 
* 电子邮件分析: 主观和客观分类器通过追踪目标单词的语言模式来检测垃圾邮件。
  
54

个编辑