第61行: |
第61行: |
| Detecting emotional information usually begins with passive sensors that capture data about the user's physical state or behavior without interpreting the input. The data gathered is analogous to the cues humans use to perceive emotions in others. For example, a video camera might capture facial expressions, body posture, and gestures, while a microphone might capture speech. Other sensors detect emotional cues by directly measuring physiological data, such as skin temperature and galvanic resistance. | | Detecting emotional information usually begins with passive sensors that capture data about the user's physical state or behavior without interpreting the input. The data gathered is analogous to the cues humans use to perceive emotions in others. For example, a video camera might capture facial expressions, body posture, and gestures, while a microphone might capture speech. Other sensors detect emotional cues by directly measuring physiological data, such as skin temperature and galvanic resistance. |
| | | |
− | 检测情感信息通常从被动传感器开始,这些传感器捕捉关于用户身体状态或行为的数据,而不解释输入信息。收集的数据类似于人类用来感知他人情感的线索。例如,摄像机可以捕捉面部表情、身体姿势和手势,而麦克风可以捕捉语音。其他传感器通过直接测量生理数据(如皮肤温度和电流电阻)来探测情感信号。 | + | 检测情感信息通常从被动传感器开始,这些传感器捕捉关于用户身体状态或行为的数据,而不解释输入信息。收集的数据类似于人类用来感知他人情感的线索。例如,摄像机可以捕捉面部表情、身体姿势和手势,而麦克风可以捕捉语音。其他传感器通过直接测量生理数据(如皮肤温度和电流电阻)来探测情感信号【7】。 |
| | | |
| Recognizing emotional information requires the extraction of meaningful patterns from the gathered data. This is done using machine learning techniques that process different [[Modality (human–computer interaction)|modalities]], such as [[speech recognition]], [[natural language processing]], or [[face recognition|facial expression detection]]. The goal of most of these techniques is to produce labels that would match the labels a human perceiver would give in the same situation: For example, if a person makes a facial expression furrowing their brow, then the computer vision system might be taught to label their face as appearing "confused" or as "concentrating" or "slightly negative" (as opposed to positive, which it might say if they were smiling in a happy-appearing way). These labels may or may not correspond to what the person is actually feeling. | | Recognizing emotional information requires the extraction of meaningful patterns from the gathered data. This is done using machine learning techniques that process different [[Modality (human–computer interaction)|modalities]], such as [[speech recognition]], [[natural language processing]], or [[face recognition|facial expression detection]]. The goal of most of these techniques is to produce labels that would match the labels a human perceiver would give in the same situation: For example, if a person makes a facial expression furrowing their brow, then the computer vision system might be taught to label their face as appearing "confused" or as "concentrating" or "slightly negative" (as opposed to positive, which it might say if they were smiling in a happy-appearing way). These labels may or may not correspond to what the person is actually feeling. |
第74行: |
第74行: |
| Another area within affective computing is the design of computational devices proposed to exhibit either innate emotional capabilities or that are capable of convincingly simulating emotions. A more practical approach, based on current technological capabilities, is the simulation of emotions in conversational agents in order to enrich and facilitate interactivity between human and machine. | | Another area within affective computing is the design of computational devices proposed to exhibit either innate emotional capabilities or that are capable of convincingly simulating emotions. A more practical approach, based on current technological capabilities, is the simulation of emotions in conversational agents in order to enrich and facilitate interactivity between human and machine. |
| | | |
− | 情感计算的另一个领域是计算设备的设计,旨在展示先天的情感能力或能够令人信服地模拟情感。基于当前的技术能力,一个更加实用的方法是模拟会话代理中的情绪,以丰富和促进人与机器之间的互动。
| + | 情感计算的另一个领域是计算设备的设计,旨在展示先天的情感能力或能够令人信服地模拟情感。基于当前的技术能力,一个更加实用的方法是模拟会话代理中的情绪,以丰富和促进人与机器之间的互动【8】。 |
| | | |
| [[Marvin Minsky]], one of the pioneering computer scientists in [[artificial intelligence]], relates emotions to the broader issues of machine intelligence stating in ''[[The Emotion Machine]]'' that emotion is "not especially different from the processes that we call 'thinking.'"<ref>{{cite news|url=https://www.washingtonpost.com/wp-dyn/content/article/2006/12/14/AR2006121401554.html|title=Mind Over Matter|last=Restak|first=Richard|date=2006-12-17|work=The Washington Post|access-date=2008-05-13}}</ref> | | [[Marvin Minsky]], one of the pioneering computer scientists in [[artificial intelligence]], relates emotions to the broader issues of machine intelligence stating in ''[[The Emotion Machine]]'' that emotion is "not especially different from the processes that we call 'thinking.'"<ref>{{cite news|url=https://www.washingtonpost.com/wp-dyn/content/article/2006/12/14/AR2006121401554.html|title=Mind Over Matter|last=Restak|first=Richard|date=2006-12-17|work=The Washington Post|access-date=2008-05-13}}</ref> |
第80行: |
第80行: |
| Marvin Minsky, one of the pioneering computer scientists in artificial intelligence, relates emotions to the broader issues of machine intelligence stating in The Emotion Machine that emotion is "not especially different from the processes that we call 'thinking.'" | | Marvin Minsky, one of the pioneering computer scientists in artificial intelligence, relates emotions to the broader issues of machine intelligence stating in The Emotion Machine that emotion is "not especially different from the processes that we call 'thinking.'" |
| | | |
− | 人工智能领域的计算机科学先驱之一马文•明斯基(Marvin Minsky)在《情绪机器》(The Emotion Machine)一书中将情绪与更广泛的机器智能问题联系起来。他在书中表示,情绪“与我们所谓的‘思考’过程并没有特别的不同。'" | + | 人工智能领域的计算机科学先驱之一马文•明斯基(Marvin Minsky)在《情绪机器》(The Emotion Machine)一书中将情绪与更广泛的机器智能问题联系起来。他在书中表示,情绪“与我们所谓的‘思考’过程并没有特别的不同。'"【9】 |
| | | |
| == Technologies == | | == Technologies == |
第87行: |
第87行: |
| In psychology, cognitive science, and in neuroscience, there have been two main approaches for describing how humans perceive and classify emotion: continuous or categorical. The continuous approach tends to use dimensions such as negative vs. positive, calm vs. aroused. | | In psychology, cognitive science, and in neuroscience, there have been two main approaches for describing how humans perceive and classify emotion: continuous or categorical. The continuous approach tends to use dimensions such as negative vs. positive, calm vs. aroused. |
| | | |
− | 在心理学、认知科学和神经科学中,描述人类如何感知和分类情绪的方法主要有两种: 连续的和分类的。持续的方法倾向于使用诸如消极与积极、平静与被唤醒之类的维度。 | + | 在心理学、认知科学和神经科学中,描述人类如何感知和分类情绪的方法主要有两种: 连续的和分类的。连续的方法倾向于使用诸如消极与积极、平静与激动之类的维度。 |
| | | |
| The categorical approach tends to use discrete classes such as happy, sad, angry, fearful, surprise, disgust. Different kinds of machine learning regression and classification models can be used for having machines produce continuous or discrete labels. Sometimes models are also built that allow combinations across the categories, e.g. a happy-surprised face or a fearful-surprised face.<ref>{{Cite journal|title = A model of the perception of facial expressions of emotion by humans: Research overview and perspectives.|last = Aleix, and Shichuan Du|first = Martinez|date = 2012|journal = The Journal of Machine Learning Research |volume=13 |issue=1 |pages=1589–1608|url=https://www.jmlr.org/papers/volume13/martinez12a/martinez12a.pdf}}</ref> | | The categorical approach tends to use discrete classes such as happy, sad, angry, fearful, surprise, disgust. Different kinds of machine learning regression and classification models can be used for having machines produce continuous or discrete labels. Sometimes models are also built that allow combinations across the categories, e.g. a happy-surprised face or a fearful-surprised face.<ref>{{Cite journal|title = A model of the perception of facial expressions of emotion by humans: Research overview and perspectives.|last = Aleix, and Shichuan Du|first = Martinez|date = 2012|journal = The Journal of Machine Learning Research |volume=13 |issue=1 |pages=1589–1608|url=https://www.jmlr.org/papers/volume13/martinez12a/martinez12a.pdf}}</ref> |
第93行: |
第93行: |
| The categorical approach tends to use discrete classes such as happy, sad, angry, fearful, surprise, disgust. Different kinds of machine learning regression and classification models can be used for having machines produce continuous or discrete labels. Sometimes models are also built that allow combinations across the categories, e.g. a happy-surprised face or a fearful-surprised face. | | The categorical approach tends to use discrete classes such as happy, sad, angry, fearful, surprise, disgust. Different kinds of machine learning regression and classification models can be used for having machines produce continuous or discrete labels. Sometimes models are also built that allow combinations across the categories, e.g. a happy-surprised face or a fearful-surprised face. |
| | | |
− | 绝对类方法倾向于使用分离的类,比如高兴、悲伤、愤怒、恐惧、惊讶、厌恶。不同类型的机器学习回归和分类模型可以用于让机器产生连续或离散的标签。有时建立的模型也允许跨类别的组合,例如。一张惊喜的脸,还是一张惊恐的脸。
| + | 分类方法倾向于使用离散的类别,如快乐,悲伤,愤怒,恐惧,惊讶,厌恶。不同类型的机器学习回归和分类模型可以用于让机器产生连续或离散的标签。有时还会构建允许跨类别组合的模型,例如 一张高兴而惊讶的脸或一张害怕而惊讶的脸【10】。 |
| | | |
| The following sections consider many of the kinds of input data used for the task of [[emotion recognition]]. | | The following sections consider many of the kinds of input data used for the task of [[emotion recognition]]. |
第106行: |
第106行: |
| Various changes in the autonomic nervous system can indirectly alter a person's speech, and affective technologies can leverage this information to recognize emotion. For example, speech produced in a state of fear, anger, or joy becomes fast, loud, and precisely enunciated, with a higher and wider range in pitch, whereas emotions such as tiredness, boredom, or sadness tend to generate slow, low-pitched, and slurred speech.Breazeal, C. and Aryananda, L. Recognition of affective communicative intent in robot-directed speech. Autonomous Robots 12 1, 2002. pp. 83–104. Some emotions have been found to be more easily computationally identified, such as anger or approval. | | Various changes in the autonomic nervous system can indirectly alter a person's speech, and affective technologies can leverage this information to recognize emotion. For example, speech produced in a state of fear, anger, or joy becomes fast, loud, and precisely enunciated, with a higher and wider range in pitch, whereas emotions such as tiredness, boredom, or sadness tend to generate slow, low-pitched, and slurred speech.Breazeal, C. and Aryananda, L. Recognition of affective communicative intent in robot-directed speech. Autonomous Robots 12 1, 2002. pp. 83–104. Some emotions have been found to be more easily computationally identified, such as anger or approval. |
| | | |
− | 自主神经系统的各种变化可以间接地改变一个人的语言,情感技术可以利用这些信息来识别情绪。例如,在恐惧、愤怒或高兴的状态下发言变得快速、响亮、清晰,音调变得越来越高、越来越宽,而诸如疲倦、厌倦或悲伤等情绪往往会产生缓慢、低沉、含糊不清的发言。机器人导向语音中情感交流意图的识别。Autonomous Robots 12 1, 2002. pp.83–104.一些情绪被发现更容易被计算识别,比如愤怒或认可。
| + | 自主神经系统的各种变化可以间接地改变一个人的语言,情感技术可以利用这些信息来识别情绪。例如,在恐惧、愤怒或高兴的状态下发言变得快速、响亮、清晰,音调变得越来越高、越来越宽,而诸如疲倦、厌倦或悲伤等情绪往往会产生缓慢、低沉、含糊不清的发言【11】。有些情绪更容易被计算识别,比如愤怒【12】或赞同【13】。 |
| | | |
| Emotional speech processing technologies recognize the user's emotional state using computational analysis of speech features. Vocal parameters and [[prosody (linguistics)|prosodic]] features such as pitch variables and speech rate can be analyzed through pattern recognition techniques.<ref name="Dellaert">Dellaert, F., Polizin, t., and Waibel, A., Recognizing Emotion in Speech", In Proc. Of ICSLP 1996, Philadelphia, PA, pp.1970–1973, 1996</ref><ref name="Lee">Lee, C.M.; Narayanan, S.; Pieraccini, R., Recognition of Negative Emotion in the Human Speech Signals, Workshop on Auto. Speech Recognition and Understanding, Dec 2001</ref> | | Emotional speech processing technologies recognize the user's emotional state using computational analysis of speech features. Vocal parameters and [[prosody (linguistics)|prosodic]] features such as pitch variables and speech rate can be analyzed through pattern recognition techniques.<ref name="Dellaert">Dellaert, F., Polizin, t., and Waibel, A., Recognizing Emotion in Speech", In Proc. Of ICSLP 1996, Philadelphia, PA, pp.1970–1973, 1996</ref><ref name="Lee">Lee, C.M.; Narayanan, S.; Pieraccini, R., Recognition of Negative Emotion in the Human Speech Signals, Workshop on Auto. Speech Recognition and Understanding, Dec 2001</ref> |
第112行: |
第112行: |
| Emotional speech processing technologies recognize the user's emotional state using computational analysis of speech features. Vocal parameters and prosodic features such as pitch variables and speech rate can be analyzed through pattern recognition techniques.Dellaert, F., Polizin, t., and Waibel, A., Recognizing Emotion in Speech", In Proc. Of ICSLP 1996, Philadelphia, PA, pp.1970–1973, 1996Lee, C.M.; Narayanan, S.; Pieraccini, R., Recognition of Negative Emotion in the Human Speech Signals, Workshop on Auto. Speech Recognition and Understanding, Dec 2001 | | Emotional speech processing technologies recognize the user's emotional state using computational analysis of speech features. Vocal parameters and prosodic features such as pitch variables and speech rate can be analyzed through pattern recognition techniques.Dellaert, F., Polizin, t., and Waibel, A., Recognizing Emotion in Speech", In Proc. Of ICSLP 1996, Philadelphia, PA, pp.1970–1973, 1996Lee, C.M.; Narayanan, S.; Pieraccini, R., Recognition of Negative Emotion in the Human Speech Signals, Workshop on Auto. Speech Recognition and Understanding, Dec 2001 |
| | | |
− | 情感语音处理技术通过对语音特征的计算分析来识别用户的情感状态。通过模式识别技术可以分析声音参数和韵律特征,如音高变量和语速等。和 Waibel,a,Recognizing Emotion In Speech”,In Proc。1996,Philadelphia,PA,pp. 1970-1973,1996 lee,c.m.; Narayanan,s. ; Pieraccini,r. ,《人类语音信号中负面情绪的识别》 ,《汽车工作室》。语音识别与理解,2001年12月
| + | 情感语音处理技术通过对语音特征的计算分析来识别用户的情感状态。通过模式识别技术【12】【14】可以分析声音参数和韵律特征,如音高变量和语速等。 |
| | | |
| Speech analysis is an effective method of identifying affective state, having an average reported accuracy of 70 to 80% in recent research.<ref>{{Cite journal|last1=Neiberg|first1=D|last2=Elenius|first2=K|last3=Laskowski|first3=K|date=2006|title=Emotion recognition in spontaneous speech using GMMs|url=http://www.speech.kth.se/prod/publications/files/1192.pdf|journal=Proceedings of Interspeech}}</ref><ref>{{Cite journal|last1=Yacoub|first1=Sherif|last2=Simske|first2=Steve|last3=Lin|first3=Xiaofan|last4=Burns|first4=John|date=2003|title=Recognition of Emotions in Interactive Voice Response Systems|journal=Proceedings of Eurospeech|pages=729–732|citeseerx=10.1.1.420.8158}}</ref> These systems tend to outperform average human accuracy (approximately 60%<ref name="Dellaert" />) but are less accurate than systems which employ other modalities for emotion detection, such as physiological states or facial expressions.<ref name="Hudlicka-2003-p24">{{harvnb|Hudlicka|2003|p=24}}</ref> However, since many speech characteristics are independent of semantics or culture, this technique is considered to be a promising route for further research.<ref name="Hudlicka-2003-p25">{{harvnb|Hudlicka|2003|p=25}}</ref> | | Speech analysis is an effective method of identifying affective state, having an average reported accuracy of 70 to 80% in recent research.<ref>{{Cite journal|last1=Neiberg|first1=D|last2=Elenius|first2=K|last3=Laskowski|first3=K|date=2006|title=Emotion recognition in spontaneous speech using GMMs|url=http://www.speech.kth.se/prod/publications/files/1192.pdf|journal=Proceedings of Interspeech}}</ref><ref>{{Cite journal|last1=Yacoub|first1=Sherif|last2=Simske|first2=Steve|last3=Lin|first3=Xiaofan|last4=Burns|first4=John|date=2003|title=Recognition of Emotions in Interactive Voice Response Systems|journal=Proceedings of Eurospeech|pages=729–732|citeseerx=10.1.1.420.8158}}</ref> These systems tend to outperform average human accuracy (approximately 60%<ref name="Dellaert" />) but are less accurate than systems which employ other modalities for emotion detection, such as physiological states or facial expressions.<ref name="Hudlicka-2003-p24">{{harvnb|Hudlicka|2003|p=24}}</ref> However, since many speech characteristics are independent of semantics or culture, this technique is considered to be a promising route for further research.<ref name="Hudlicka-2003-p25">{{harvnb|Hudlicka|2003|p=25}}</ref> |
第118行: |
第118行: |
| Speech analysis is an effective method of identifying affective state, having an average reported accuracy of 70 to 80% in recent research. These systems tend to outperform average human accuracy (approximately 60%) but are less accurate than systems which employ other modalities for emotion detection, such as physiological states or facial expressions. However, since many speech characteristics are independent of semantics or culture, this technique is considered to be a promising route for further research. | | Speech analysis is an effective method of identifying affective state, having an average reported accuracy of 70 to 80% in recent research. These systems tend to outperform average human accuracy (approximately 60%) but are less accurate than systems which employ other modalities for emotion detection, such as physiological states or facial expressions. However, since many speech characteristics are independent of semantics or culture, this technique is considered to be a promising route for further research. |
| | | |
− | 语音分析是一种有效的情感状态识别方法,在最近的研究中,语音分析的平均报告准确率为70%-80% 。这些系统往往比人类的平均准确率(大约60%)更高,但是不如使用其他情绪检测方式的系统准确,比如生理状态或面部表情。然而,由于许多言语特征是独立于语义或文化的,这种技术被认为是一个很有前途的进一步研究路线。 | + | 语音分析是一种有效的情感状态识别方法,在最近的研究中,语音分析的平均报告准确率为70%-80%【15】【16】 。这些系统往往比人类的平均准确率(大约60%【12】)更高,但是不如使用其他情绪检测方式的系统准确,比如生理状态或面部表情【17】。然而,由于许多言语特征是独立于语义或文化的,这种技术被认为是一个很有前途的研究路线【18】。 |
| | | |
| ====Algorithms==== | | ====Algorithms==== |
第143行: |
第143行: |
| broad enough to fit every need for its application, as well as the selection of a successful classifier which will allow for quick and accurate emotion identification. | | broad enough to fit every need for its application, as well as the selection of a successful classifier which will allow for quick and accurate emotion identification. |
| | | |
− | 语音/文本影响检测的过程需要创建一个可靠的数据库、知识库或者向量空间模型数据库,这些数据库的范围足以满足其应用的所有需要,同时还需要选择一个成功的分类器,这样才能快速准确地识别情感。 | + | 语音/文本影响检测的过程需要创建一个可靠的数据库、知识库或者向量空间模型【19】,这些数据库的范围足以满足其应用的所有需要,同时还需要选择一个成功的分类器,这样才能快速准确地识别情感。 |
| | | |
| Currently, the most frequently used classifiers are linear discriminant classifiers (LDC), k-nearest neighbor (k-NN), Gaussian mixture model (GMM), support vector machines (SVM), artificial neural networks (ANN), decision tree algorithms and hidden Markov models (HMMs).<ref name="Scherer-2010-p241">{{harvnb|Scherer|Bänziger|Roesch|2010|p=241}}</ref> Various studies showed that choosing the appropriate classifier can significantly enhance the overall performance of the system.<ref name="Hudlicka-2003-p24"/> The list below gives a brief description of each algorithm: | | Currently, the most frequently used classifiers are linear discriminant classifiers (LDC), k-nearest neighbor (k-NN), Gaussian mixture model (GMM), support vector machines (SVM), artificial neural networks (ANN), decision tree algorithms and hidden Markov models (HMMs).<ref name="Scherer-2010-p241">{{harvnb|Scherer|Bänziger|Roesch|2010|p=241}}</ref> Various studies showed that choosing the appropriate classifier can significantly enhance the overall performance of the system.<ref name="Hudlicka-2003-p24"/> The list below gives a brief description of each algorithm: |
第149行: |
第149行: |
| Currently, the most frequently used classifiers are linear discriminant classifiers (LDC), k-nearest neighbor (k-NN), Gaussian mixture model (GMM), support vector machines (SVM), artificial neural networks (ANN), decision tree algorithms and hidden Markov models (HMMs). Various studies showed that choosing the appropriate classifier can significantly enhance the overall performance of the system. The list below gives a brief description of each algorithm: | | Currently, the most frequently used classifiers are linear discriminant classifiers (LDC), k-nearest neighbor (k-NN), Gaussian mixture model (GMM), support vector machines (SVM), artificial neural networks (ANN), decision tree algorithms and hidden Markov models (HMMs). Various studies showed that choosing the appropriate classifier can significantly enhance the overall performance of the system. The list below gives a brief description of each algorithm: |
| | | |
− | 目前常用的分类器有线性判别分类器(LDC)、 k- 近邻分类器(k-NN)、高斯混合模型(GMM)、支持向量机(SVM)、人工神经网络(ANN)、决策树算法和隐马尔可夫模型(HMMs)。各种研究表明,选择合适的分类器可以显著提高系统的整体性能。下面的列表给出了每个算法的简要描述: | + | 目前常用的分类器有线性判别分类器(LDC)、 k- 近邻分类器(k-NN)、高斯混合模型(GMM)、支持向量机(SVM)、人工神经网络(ANN)、决策树算法和隐马尔可夫模型(HMMs)【20】。各种研究表明,选择合适的分类器可以显著提高系统的整体性能。下面的列表给出了每个算法的简要描述: |
| | | |
| * [[Linear classifier|LDC]] – Classification happens based on the value obtained from the linear combination of the feature values, which are usually provided in the form of vector features. | | * [[Linear classifier|LDC]] – Classification happens based on the value obtained from the linear combination of the feature values, which are usually provided in the form of vector features. |
第170行: |
第170行: |
| * LDC-根据特征值的线性组合值进行分类,特征值通常以矢量特征的形式提供。 | | * LDC-根据特征值的线性组合值进行分类,特征值通常以矢量特征的形式提供。 |
| * k-NN-分类是通过在特征空间中定位目标,并与 k 个最近邻(训练样本)进行比较来实现的。多数票决定分类。 | | * k-NN-分类是通过在特征空间中定位目标,并与 k 个最近邻(训练样本)进行比较来实现的。多数票决定分类。 |
− | * GMM-是一个概率模型,用于表示总体种群中存在的子种群。每个子种群使用混合分布来描述,这允许将观测分类为子种群。“高斯混合模型”。知识共享与社区建设。10 March 2011. | + | * GMM-是一种概率模型,用于表示总体中是否存在亚群。 每个子群都使用混合分布来描述,这允许将观察结果分类到子群中【21】。 |
− | * SVM-是一种(通常为二进制)线性分类器,它决定每个输入可能属于两个(或多个)可能类别中的哪一个。人工神经网络是一种受生物神经网络启发的数学模型,能够更好地把握特征空间可能存在的非线性。 | + | * SVM-是一种(通常为二进制)线性分类器,它决定每个输入可能属于两个(或多个)可能类别中的哪一个。 |
− | * 决策树算法——基于下面的决策树,其中的叶子代表分类结果,分支代表导致分类的后续特征之间的关联。 | + | * ANN-是一种受生物神经网络启发的数学模型,能够更好地把握特征空间可能存在的非线性。 |
| + | * 决策树算法——基于遵循决策树的工作,其中叶子代表分类结果,而分支代表导致分类的后续特征的结合 |
| * HMMs-一个统计马尔可夫模型,其中的状态和状态转变不能直接用于观测。相反,依赖于状态的一系列输出是可见的。在情感识别的情况下,输出表示语音特征向量的序列,这样可以推导出模型所经过的状态序列。这些状态可以由表达情绪的各种中间步骤组成,每个概率分布都有一个可能的输出向量。状态序列允许我们预测我们试图分类的情感状态,这是语音情感检测领域最常用的技术之一。 | | * HMMs-一个统计马尔可夫模型,其中的状态和状态转变不能直接用于观测。相反,依赖于状态的一系列输出是可见的。在情感识别的情况下,输出表示语音特征向量的序列,这样可以推导出模型所经过的状态序列。这些状态可以由表达情绪的各种中间步骤组成,每个概率分布都有一个可能的输出向量。状态序列允许我们预测我们试图分类的情感状态,这是语音情感检测领域最常用的技术之一。 |
| | | |
第179行: |
第180行: |
| It is proved that having enough acoustic evidence available the emotional state of a person can be classified by a set of majority voting classifiers. The proposed set of classifiers is based on three main classifiers: kNN, C4.5 and SVM-RBF Kernel. This set achieves better performance than each basic classifier taken separately. It is compared with two other sets of classifiers: one-against-all (OAA) multiclass SVM with Hybrid kernels and the set of classifiers which consists of the following two basic classifiers: C5.0 and Neural Network. The proposed variant achieves better performance than the other two sets of classifiers. | | It is proved that having enough acoustic evidence available the emotional state of a person can be classified by a set of majority voting classifiers. The proposed set of classifiers is based on three main classifiers: kNN, C4.5 and SVM-RBF Kernel. This set achieves better performance than each basic classifier taken separately. It is compared with two other sets of classifiers: one-against-all (OAA) multiclass SVM with Hybrid kernels and the set of classifiers which consists of the following two basic classifiers: C5.0 and Neural Network. The proposed variant achieves better performance than the other two sets of classifiers. |
| | | |
− | 证明了一组多数投票分类器可以用足够的声学证据对人的情绪状态进行分类。该分类器集合基于三个主要分类器: kNN、 C4.5和 SVM-RBF 核。该分类器比单独采集的基本分类器具有更好的分类性能。本文提出了一种基于混合核函数的多类支持向量机,并将其与另外两类支持向量机进行了比较。与其他两组分类器相比,该方法具有更好的分类性能。
| + | 事实证明,如果有足够的声学证据,可以通过一组多数投票分类器对一个人的情绪状态进行分类。该分类器集合基于三个主要分类器: kNN、 C4.5和 SVM-RBF 核。该分类器比单独采集的基本分类器具有更好的分类性能。将其与其他两组分类器进行比较:具有混合内核的一对多 (OAA) 多类 SVM 和由以下两个基本分类器组成的分类器组:C5.0 和神经网络。所提出的变体比其他两组分类器获得了更好的性能【22】。 |
| | | |
| ====Databases==== | | ====Databases==== |
第191行: |
第192行: |
| The vast majority of present systems are data-dependent. This creates one of the biggest challenges in detecting emotions based on speech, as it implicates choosing an appropriate database used to train the classifier. Most of the currently possessed data was obtained from actors and is thus a representation of archetypal emotions. Those so-called acted databases are usually based on the Basic Emotions theory (by Paul Ekman), which assumes the existence of six basic emotions (anger, fear, disgust, surprise, joy, sadness), the others simply being a mix of the former ones.Ekman, P. & Friesen, W. V (1969). The repertoire of nonverbal behavior: Categories, origins, usage, and coding. Semiotica, 1, 49–98. Nevertheless, these still offer high audio quality and balanced classes (although often too few), which contribute to high success rates in recognizing emotions. | | The vast majority of present systems are data-dependent. This creates one of the biggest challenges in detecting emotions based on speech, as it implicates choosing an appropriate database used to train the classifier. Most of the currently possessed data was obtained from actors and is thus a representation of archetypal emotions. Those so-called acted databases are usually based on the Basic Emotions theory (by Paul Ekman), which assumes the existence of six basic emotions (anger, fear, disgust, surprise, joy, sadness), the others simply being a mix of the former ones.Ekman, P. & Friesen, W. V (1969). The repertoire of nonverbal behavior: Categories, origins, usage, and coding. Semiotica, 1, 49–98. Nevertheless, these still offer high audio quality and balanced classes (although often too few), which contribute to high success rates in recognizing emotions. |
| | | |
− | 目前绝大多数系统都依赖于数据。这在基于语音的情绪检测中创造了一个最大的挑战,因为它牵涉到选择一个合适的数据库用于训练分类器。目前拥有的大多数数据是从演员,因此是一个代表的原型情绪。这些所谓的行为数据库通常是基于基本情绪理论(保罗 · 埃克曼) ,该理论假定存在六种基本情绪(愤怒、恐惧、厌恶、惊讶、喜悦、悲伤) ,其他情绪只是前者的混合体。埃克曼,p. & Friesen,w. v (1969)。非语言行为的全部: 分类、起源、用法和编码。Semiotica,1,49-98.然而,这些仍然提供高音质和平衡的课程(虽然通常太少) ,这有助于高成功率识别情绪。
| + | 绝大多数现有系统都依赖于数据。 这造成了基于语音检测情绪的最大挑战之一,因为它涉及选择用于训练分类器的合适数据库。 目前拥有的大部分数据都是从演员那里获得的,因此是原型情感的代表。这些所谓的行为数据库通常是基于基本情绪理论(保罗 · 埃克曼) ,该理论假定存在六种基本情绪(愤怒、恐惧、厌恶、惊讶、喜悦、悲伤) ,其他情绪只是前者的混合体【23】。尽管如此,这些仍然提供高音质和平衡的类别(尽管通常太少),有助于提高识别情绪的成功率。 |
| | | |
| However, for real life application, naturalistic data is preferred. A naturalistic database can be produced by observation and analysis of subjects in their natural context. Ultimately, such database should allow the system to recognize emotions based on their context as well as work out the goals and outcomes of the interaction. The nature of this type of data allows for authentic real life implementation, due to the fact it describes states naturally occurring during the [[human–computer interaction]] (HCI). | | However, for real life application, naturalistic data is preferred. A naturalistic database can be produced by observation and analysis of subjects in their natural context. Ultimately, such database should allow the system to recognize emotions based on their context as well as work out the goals and outcomes of the interaction. The nature of this type of data allows for authentic real life implementation, due to the fact it describes states naturally occurring during the [[human–computer interaction]] (HCI). |
第197行: |
第198行: |
| However, for real life application, naturalistic data is preferred. A naturalistic database can be produced by observation and analysis of subjects in their natural context. Ultimately, such database should allow the system to recognize emotions based on their context as well as work out the goals and outcomes of the interaction. The nature of this type of data allows for authentic real life implementation, due to the fact it describes states naturally occurring during the human–computer interaction (HCI). | | However, for real life application, naturalistic data is preferred. A naturalistic database can be produced by observation and analysis of subjects in their natural context. Ultimately, such database should allow the system to recognize emotions based on their context as well as work out the goals and outcomes of the interaction. The nature of this type of data allows for authentic real life implementation, due to the fact it describes states naturally occurring during the human–computer interaction (HCI). |
| | | |
− | 然而,对于现实生活应用,自然数据是首选的。一个自然主义的数据库可以通过观察和分析的主题在他们的自然环境。最终,这样的数据库应该允许系统根据情绪的背景识别情绪,并制定出交互的目标和结果。这种类型的数据的性质允许真实的实现,因为它描述了在人机交互(HCI)过程中自然发生的状态。
| + | 然而,对于现实生活应用,自然数据是首选的。自然数据库可以通过在自然环境中观察和分析对象来产生。最终,这样的数据库应该允许系统根据他们的上下文识别情绪,并制定交互的目标和结果。此类数据的性质允许真实的现实生活实施,因为它描述了人机交互 (HCI) 期间自然发生的状态。 |
| | | |
| Despite the numerous advantages which naturalistic data has over acted data, it is difficult to obtain and usually has low emotional intensity. Moreover, data obtained in a natural context has lower signal quality, due to surroundings noise and distance of the subjects from the microphone. The first attempt to produce such database was the FAU Aibo Emotion Corpus for CEICES (Combining Efforts for Improving Automatic Classification of Emotional User States), which was developed based on a realistic context of children (age 10–13) playing with Sony's Aibo robot pet.<ref name="Steidl-2011">{{cite web | last = Steidl | first = Stefan | title = FAU Aibo Emotion Corpus | publisher = Pattern Recognition Lab | date = 5 March 2011 | url = http://www5.cs.fau.de/de/mitarbeiter/steidl-stefan/fau-aibo-emotion-corpus/ }}</ref><ref name="Scherer-2010-p243">{{harvnb|Scherer|Bänziger|Roesch|2010|p=243}}</ref> Likewise, producing one standard database for all emotional research would provide a method of evaluating and comparing different affect recognition systems. | | Despite the numerous advantages which naturalistic data has over acted data, it is difficult to obtain and usually has low emotional intensity. Moreover, data obtained in a natural context has lower signal quality, due to surroundings noise and distance of the subjects from the microphone. The first attempt to produce such database was the FAU Aibo Emotion Corpus for CEICES (Combining Efforts for Improving Automatic Classification of Emotional User States), which was developed based on a realistic context of children (age 10–13) playing with Sony's Aibo robot pet.<ref name="Steidl-2011">{{cite web | last = Steidl | first = Stefan | title = FAU Aibo Emotion Corpus | publisher = Pattern Recognition Lab | date = 5 March 2011 | url = http://www5.cs.fau.de/de/mitarbeiter/steidl-stefan/fau-aibo-emotion-corpus/ }}</ref><ref name="Scherer-2010-p243">{{harvnb|Scherer|Bänziger|Roesch|2010|p=243}}</ref> Likewise, producing one standard database for all emotional research would provide a method of evaluating and comparing different affect recognition systems. |
第203行: |
第204行: |
| Despite the numerous advantages which naturalistic data has over acted data, it is difficult to obtain and usually has low emotional intensity. Moreover, data obtained in a natural context has lower signal quality, due to surroundings noise and distance of the subjects from the microphone. The first attempt to produce such database was the FAU Aibo Emotion Corpus for CEICES (Combining Efforts for Improving Automatic Classification of Emotional User States), which was developed based on a realistic context of children (age 10–13) playing with Sony's Aibo robot pet. Likewise, producing one standard database for all emotional research would provide a method of evaluating and comparing different affect recognition systems. | | Despite the numerous advantages which naturalistic data has over acted data, it is difficult to obtain and usually has low emotional intensity. Moreover, data obtained in a natural context has lower signal quality, due to surroundings noise and distance of the subjects from the microphone. The first attempt to produce such database was the FAU Aibo Emotion Corpus for CEICES (Combining Efforts for Improving Automatic Classification of Emotional User States), which was developed based on a realistic context of children (age 10–13) playing with Sony's Aibo robot pet. Likewise, producing one standard database for all emotional research would provide a method of evaluating and comparing different affect recognition systems. |
| | | |
− | 尽管自然主义数据比实际数据有许多优点,但它很难获得,而且通常情绪强度较低。此外,在自然环境下获得的数据信号质量较低,这是由于周围环境的噪声和被试者距离麦克风的距离。第一个尝试建立这样的数据库的是 FAU Aibo 情感语料库,该语料库是基于一个真实的儿童(10-13岁)与索尼的 Aibo 机器人宠物玩耍的环境而开发的。同样地,为所有情感研究建立一个标准的数据库将提供一种评估和比较不同情感识别系统的方法。
| + | 尽管自然数据比行为数据具有许多优势,但很难获得并且通常情绪强度较低。此外,由于环境噪声和对象与麦克风的距离,在自然环境中获得的数据具有较低的信号质量。第一次尝试创建这样的数据库是 FAU Aibo Emotion Corpus for CEICES(Combining Efforts for Improvement Automatic Classification of Emotional User States),它是基于儿童(10-13 岁)与索尼 Aibo 机器人宠物玩耍的真实情境开发的 。同样,为所有情感研究生成一个标准数据库将提供一种评估和比较不同情感识别系统的方法。 |
| | | |
| ====Speech descriptors==== | | ====Speech descriptors==== |