更改

情感计算 (查看源代码)

2021年8月1日 (日) 09:41的版本

添加261字节、 2021年8月1日 (日) 09:41

第二部分润色

第82行：第82行：

== Technologies ==

+

== 技术 ==

In psychology, cognitive science, and in neuroscience, there have been two main approaches for describing how humans perceive and classify emotion: continuous or categorical. The continuous approach tends to use dimensions such as negative vs. positive, calm vs. aroused.

第92行：第94行：

The categorical approach tends to use discrete classes such as happy, sad, angry, fearful, surprise, disgust. Different kinds of machine learning regression and classification models can be used for having machines produce continuous or discrete labels. Sometimes models are also built that allow combinations across the categories, e.g. a happy-surprised face or a fearful-surprised face.

−

分类方法倾向于使用离散的类别，如快乐，悲伤，愤怒，恐惧，惊讶，厌恶。不同类型的机器学习回归和分类模型可以用于让机器产生连续或离散的标签。有时还会构建允许跨类别组合的模型，例如一张高兴而惊讶的脸或一张害怕而惊讶的脸【10】。

+

分类方法倾向于使用离散的类别，如快乐，悲伤，愤怒，恐惧，惊讶，厌恶。不同类型的机器学习回归和分类模型可以用于让机器产生连续或离散的标签。有时还会构建跨类别组合的模型，例如一张高兴而惊讶的脸或一张害怕而惊讶的脸【10】。

The following sections consider many of the kinds of input data used for the task of [[emotion recognition]].

第98行：第100行：

The following sections consider many of the kinds of input data used for the task of emotion recognition.

−

~~接下来的部分将讨论用于情感识别任务的各种输入数据。~~

+

接下来将讨论用于情感识别的不同种类的输入数据。

===Emotional speech===

+

=== 语音情感 ===

Various changes in the autonomic nervous system can indirectly alter a person's speech, and affective technologies can leverage this information to recognize emotion. For example, speech produced in a state of fear, anger, or joy becomes fast, loud, and precisely enunciated, with a higher and wider range in pitch, whereas emotions such as tiredness, boredom, or sadness tend to generate slow, low-pitched, and slurred speech.<ref>Breazeal, C. and Aryananda, L. [http://web.media.mit.edu/~cynthiab/Papers/breazeal-aryananda-AutoRo02.pdf Recognition of affective communicative intent in robot-directed speech]. Autonomous Robots 12 1, 2002. pp. 83–104.</ref> Some emotions have been found to be more easily computationally identified, such as anger<ref name="Dellaert" /> or approval.<ref>{{Cite book|last1=Roy|first1=D.|last2=Pentland|first2=A.|date=1996-10-01|title=Automatic spoken affect classification and analysis|journal=Proceedings of the Second International Conference on Automatic Face and Gesture Recognition|pages=363–367|doi=10.1109/AFGR.1996.557292|isbn=978-0-8186-7713-7|s2cid=23157273}}</ref>

Various changes in the autonomic nervous system can indirectly alter a person's speech, and affective technologies can leverage this information to recognize emotion. For example, speech produced in a state of fear, anger, or joy becomes fast, loud, and precisely enunciated, with a higher and wider range in pitch, whereas emotions such as tiredness, boredom, or sadness tend to generate slow, low-pitched, and slurred speech.Breazeal, C. and Aryananda, L. Recognition of affective communicative intent in robot-directed speech. Autonomous Robots 12 1, 2002. pp. 83–104. Some emotions have been found to be more easily computationally identified, such as anger or approval.

−

自主神经系统的各种变化可以间接地改变一个人的语言，情感技术可以利用这些信息来识别情绪。例如，在恐惧、愤怒或高兴的状态下发言变得快速、响亮、清晰，音调变得越来越高、越来越宽，而诸如疲倦、厌倦或悲伤等情绪往往会产生缓慢、低沉、含糊不清的发言【11】。有些情绪更容易被计算识别，比如愤怒【12】或赞同【13】。

+

自主神经系统的各种变化可以间接地改变一个人的语言，情感技术可以利用这些信息来识别情绪。例如，在恐惧、愤怒或高兴的状态下发言变得快速、响亮、清晰，音调变得越来越高，音域越来越宽；而诸如疲倦、厌倦或悲伤等情绪往往会产生缓慢、低沉、含糊不清的语音【11】。有些情绪更容易被计算识别，比如愤怒【12】或赞同【13】。

Emotional speech processing technologies recognize the user's emotional state using computational analysis of speech features. Vocal parameters and [[prosody (linguistics)|prosodic]] features such as pitch variables and speech rate can be analyzed through pattern recognition techniques.<ref name="Dellaert">Dellaert, F., Polizin, t., and Waibel, A., Recognizing Emotion in Speech", In Proc. Of ICSLP 1996, Philadelphia, PA, pp.1970–1973, 1996</ref><ref name="Lee">Lee, C.M.; Narayanan, S.; Pieraccini, R., Recognition of Negative Emotion in the Human Speech Signals, Workshop on Auto. Speech Recognition and Understanding, Dec 2001</ref>

第111行：第115行：

Emotional speech processing technologies recognize the user's emotional state using computational analysis of speech features. Vocal parameters and prosodic features such as pitch variables and speech rate can be analyzed through pattern recognition techniques.Dellaert, F., Polizin, t., and Waibel, A., Recognizing Emotion in Speech", In Proc. Of ICSLP 1996, Philadelphia, PA, pp.1970–1973, 1996Lee, C.M.; Narayanan, S.; Pieraccini, R., Recognition of Negative Emotion in the Human Speech Signals, Workshop on Auto. Speech Recognition and Understanding, Dec 2001

−

情感语音处理技术通过对语音特征的计算分析来识别用户的情感状态。通过模式识别技术【12】【14】可以分析声音参数和韵律特征，如音高变量和语速等。

+

情感语音处理技术通过对语音特征的计算分析来识别用户的情感状态。通过模式识别技术【12】【14】可以分析声音参数和韵律特征，如音调高低和语速等。

Speech analysis is an effective method of identifying affective state, having an average reported accuracy of 70 to 80% in recent research.<ref>{{Cite journal|last1=Neiberg|first1=D|last2=Elenius|first2=K|last3=Laskowski|first3=K|date=2006|title=Emotion recognition in spontaneous speech using GMMs|url=http://www.speech.kth.se/prod/publications/files/1192.pdf|journal=Proceedings of Interspeech}}</ref><ref>{{Cite journal|last1=Yacoub|first1=Sherif|last2=Simske|first2=Steve|last3=Lin|first3=Xiaofan|last4=Burns|first4=John|date=2003|title=Recognition of Emotions in Interactive Voice Response Systems|journal=Proceedings of Eurospeech|pages=729–732|citeseerx=10.1.1.420.8158}}</ref> These systems tend to outperform average human accuracy (approximately 60%<ref name="Dellaert" />) but are less accurate than systems which employ other modalities for emotion detection, such as physiological states or facial expressions.<ref name="Hudlicka-2003-p24">{{harvnb|Hudlicka|2003|p=24}}</ref> However, since many speech characteristics are independent of semantics or culture, this technique is considered to be a promising route for further research.<ref name="Hudlicka-2003-p25">{{harvnb|Hudlicka|2003|p=25}}</ref>

第117行：第121行：

Speech analysis is an effective method of identifying affective state, having an average reported accuracy of 70 to 80% in recent research. These systems tend to outperform average human accuracy (approximately 60%) but are less accurate than systems which employ other modalities for emotion detection, such as physiological states or facial expressions. However, since many speech characteristics are independent of semantics or culture, this technique is considered to be a promising route for further research.

−

语音分析是一种有效的情感状态识别方法，在最近的研究中，语音分析的平均报告准确率为70%-80%【15】【16】。这些系统往往比人类的平均准确率(大约60%【12】)更高，但是不如使用其他情绪检测方式的系统准确，比如生理状态或面部表情【17】。然而，由于许多言语特征是独立于语义或文化的，这种技术被认为是一个很有前途的研究路线【18】。

+

语音分析是一种有效的情感状态识别方法，在最近的研究中，语音分析的平均报告准确率为70%-80%【15】【16】。这些系统往往比人类的平均准确率(大约60%【12】)更高，但是不如使用其他情绪检测方式准确，比如生理状态或面部表情【17】。然而，由于许多言语特征是独立于语义或文化的，这种技术被认为是一个很有前景的研究方向【18】。

====Algorithms====

第123行：第127行：

====Algorithms====

−

= = = 算法 = = =

+

==== = = 算法 = = ====

The process of speech/text affect detection requires the creation of a reliable [[database]], [[knowledge base]], or [[vector space model]],<ref name="Osgood75">

第142行：第146行：

broad enough to fit every need for its application, as well as the selection of a successful classifier which will allow for quick and accurate emotion identification.

−

语音/文本影响检测的过程需要创建一个可靠的数据库、知识库或者向量空间模型【19】，这些数据库的范围足以满足其应用的所有需要，同时还需要选择一个成功的分类器，这样才能快速准确地识别情感。

+

语音/文本的情感检测程需要创建可靠的数据库、知识库或者向量空间模型【19】，为了适应各种应用，这些数据库的范围需要足够广泛；同时还需要选择一个又快又准的分类器，这样才能快速准确地识别情感。

Currently, the most frequently used classifiers are linear discriminant classifiers (LDC), k-nearest neighbor (k-NN), Gaussian mixture model (GMM), support vector machines (SVM), artificial neural networks (ANN), decision tree algorithms and hidden Markov models (HMMs).<ref name="Scherer-2010-p241">{{harvnb|Scherer|Bänziger|Roesch|2010|p=241}}</ref> Various studies showed that choosing the appropriate classifier can significantly enhance the overall performance of the system.<ref name="Hudlicka-2003-p24"/> The list below gives a brief description of each algorithm:

第167行：第171行： −

* LDC-~~根据特征值的线性组合值进行分类，特征值通常以矢量特征的形式提供。~~

+

* LDC-特征以向量形式表示，通过计算特征的线性组合来分类。

−

* k-NN-~~分类是通过在特征空间中定位目标，并与 k 个最近邻(训练样本)进行比较来实现的。多数票决定分类。~~

+

* k-NN-计算并选取特征空间中的点，将其与k个最近的数据点相比较，频数最大的类即为分类结果。

−

* GMM-~~是一种概率模型，用于表示总体中是否存在亚群。每个子群都使用混合分布来描述，这允许将观察结果分类到子群中【21】。~~

+

* GMM-是一种概率模型，用于表示总体中子群的存在。利用特征的多个高斯概率密度函数混合来分类【21】。

−

* SVM-是一种(~~通常为二进制~~)线性分类器，它决定每个输入可能属于两个(或多个)可能类别中的哪一个。

+

* SVM-是一种(通常为二分的)线性分类器，它决定每个输入可能属于两个(或多个)可能类别中的哪一个。

−

* ANN-~~是一种受生物神经网络启发的数学模型，能够更好地把握特征空间可能存在的非线性。~~

+

* ANN-是一种受生物神经网络启发的数学模型，能够更好地处理特征空间可能存在的非线性。

−

* ~~决策树算法——基于遵循决策树的工作，其中叶子代表分类结果，而分支代表导致分类的后续特征的结合~~

+

* 决策树算法——在一颗树中，每个叶子结点都是一个分类点，分支（路径）代表了一系列相邻接的特征，最终引向叶子节点实现分类。

−

* HMMs-一个统计马尔可夫模型，其中的状态和状态转变不能直接用于观测。相反，依赖于状态的一系列输出是可见的。在情感识别的情况下，输出表示语音特征向量的序列，这样可以推导出模型所经过的状态序列。这些状态可以由表达情绪的各种中间步骤组成，每个概率分布都有一个可能的输出向量。状态序列允许我们预测我们试图分类的情感状态，这是语音情感检测领域最常用的技术之一。

+

* HMMs-一种统计马尔可夫模型，其中的状态和状态转变不能直接用于观测。相反，依赖于状态的一系列输出是可见的。在情感识别领域，输出代表了语音特征向量的序列，这样可以推导出模型所经过的状态序列。这些状态包括情感表达中的各中间步骤，每个状态在输出向量上都有一个概率分布。状态序列是我们能够预测正在试图分类的情感状态，这也是语音情感识别中最为常用的技术之一。

It is proved that having enough acoustic evidence available the emotional state of a person can be classified by a set of majority voting classifiers. The proposed set of classifiers is based on three main classifiers: kNN, C4.5 and SVM-RBF Kernel. This set achieves better performance than each basic classifier taken separately. It is compared with two other sets of classifiers: one-against-all (OAA) multiclass SVM with Hybrid kernels and the set of classifiers which consists of the following two basic classifiers: C5.0 and Neural Network. The proposed variant achieves better performance than the other two sets of classifiers.<ref>{{cite journal|url=http://ntv.ifmo.ru/en/article/11200/raspoznavanie_i_prognozirovanie_dlitelnyh__emociy_v_rechi_(na_angl._yazyke).htm|title=Extended speech emotion recognition and prediction|author=S.E. Khoruzhnikov|journal=Scientific and Technical Journal of Information Technologies, Mechanics and Optics|volume=14|issue=6|page=137|year=2014|display-authors=etal}}</ref>

第179行：第183行：

It is proved that having enough acoustic evidence available the emotional state of a person can be classified by a set of majority voting classifiers. The proposed set of classifiers is based on three main classifiers: kNN, C4.5 and SVM-RBF Kernel. This set achieves better performance than each basic classifier taken separately. It is compared with two other sets of classifiers: one-against-all (OAA) multiclass SVM with Hybrid kernels and the set of classifiers which consists of the following two basic classifiers: C5.0 and Neural Network. The proposed variant achieves better performance than the other two sets of classifiers.

−

~~事实证明，如果有足够的声学证据，可以通过一组多数投票分类器对一个人的情绪状态进行分类。该分类器集合基于三个主要分类器~~: kNN、 C4.5和 SVM-RBF ~~核。该分类器比单独采集的基本分类器具有更好的分类性能。将其与其他两组分类器进行比较：具有混合内核的一对多~~ (OAA) 多类 SVM ~~和由以下两个基本分类器组成的分类器组：C5~~.0 ~~和神经网络。所提出的变体比其他两组分类器获得了更好的性能【22】。~~

+

研究证明，如果有足够的声音样本，人的情感可以被大多数主流分类器所正确分类。分类器模型由三个主要分类器组合而成: kNN、 C4.5和 SVM-RBF 核。该分类器比单独采集的基本分类器具有更好的分类性能。另外两组分类器为：1）具有混合内核的一对多 (OAA) 多类 SVM ，2）由C5.0 和神经网络两个基本分类器组成的分类器组，所提出的变体比这两组分类器有更好的性能【22】。

====Databases====

第185行：第189行：

====Databases====

−

= = = = 数据库 = = =

+

==== = = = 数据库 = = ====

The vast majority of present systems are data-dependent. This creates one of the biggest challenges in detecting emotions based on speech, as it implicates choosing an appropriate database used to train the classifier. Most of the currently possessed data was obtained from actors and is thus a representation of archetypal emotions. Those so-called acted databases are usually based on the Basic Emotions theory (by [[Paul Ekman]]), which assumes the existence of six basic emotions (anger, fear, disgust, surprise, joy, sadness), the others simply being a mix of the former ones.<ref name="Ekman, P. 1969">Ekman, P. & Friesen, W. V (1969). [http://www.communicationcache.com/uploads/1/0/8/8/10887248/the-repertoire-of-nonverbal-behavior-categories-origins-usage-and-coding.pdf The repertoire of nonverbal behavior: Categories, origins, usage, and coding]. Semiotica, 1, 49–98.</ref> Nevertheless, these still offer high audio quality and balanced classes (although often too few), which contribute to high success rates in recognizing emotions.

第191行：第195行：

The vast majority of present systems are data-dependent. This creates one of the biggest challenges in detecting emotions based on speech, as it implicates choosing an appropriate database used to train the classifier. Most of the currently possessed data was obtained from actors and is thus a representation of archetypal emotions. Those so-called acted databases are usually based on the Basic Emotions theory (by Paul Ekman), which assumes the existence of six basic emotions (anger, fear, disgust, surprise, joy, sadness), the others simply being a mix of the former ones.Ekman, P. & Friesen, W. V (1969). The repertoire of nonverbal behavior: Categories, origins, usage, and coding. Semiotica, 1, 49–98. Nevertheless, these still offer high audio quality and balanced classes (although often too few), which contribute to high success rates in recognizing emotions.

−

绝大多数现有系统都依赖于数据。这造成了基于语音检测情绪的最大挑战之一，因为它涉及选择用于训练分类器的合适数据库。目前拥有的大部分数据都是从演员那里获得的，因此是原型情感的代表。这些所谓的行为数据库通常是基于基本情绪理论(保罗 · 埃克曼) ，该理论假定存在六种基本情绪(愤怒、恐惧、厌恶、惊讶、喜悦、悲伤) ~~，其他情绪只是前者的混合体【23】。尽管如此，这些仍然提供高音质和平衡的类别（尽管通常太少），有助于提高识别情绪的成功率。~~

+

绝大多数现有系统都依赖于数据。选择一个恰当的数据库来训练分类器因而成为语音情感识别的首要问题。目前拥有的大部分数据都是从演员那里获得的，都是一些典型的情绪表现。这些所谓的行为数据库通常是基于基本情绪理论(保罗 · 埃克曼) ，该理论假定存在六种基本情绪(愤怒、恐惧、厌恶、惊讶、喜悦、悲伤) ，其他情绪只是前者的混合体【23】。尽管如此，这仍然提供较高的音质和均衡的类别（尽管通常太少），有助于提高识别情绪的成功率。

However, for real life application, naturalistic data is preferred. A naturalistic database can be produced by observation and analysis of subjects in their natural context. Ultimately, such database should allow the system to recognize emotions based on their context as well as work out the goals and outcomes of the interaction. The nature of this type of data allows for authentic real life implementation, due to the fact it describes states naturally occurring during the [[human–computer interaction]] (HCI).

第197行：第201行：

However, for real life application, naturalistic data is preferred. A naturalistic database can be produced by observation and analysis of subjects in their natural context. Ultimately, such database should allow the system to recognize emotions based on their context as well as work out the goals and outcomes of the interaction. The nature of this type of data allows for authentic real life implementation, due to the fact it describes states naturally occurring during the human–computer interaction (HCI).

−

然而，对于现实生活应用，自然数据是首选的。自然数据库可以通过在自然环境中观察和分析对象来产生。最终，这样的数据库应该允许系统根据他们的上下文识别情绪，并制定交互的目标和结果。此类数据的性质允许真实的现实生活实施，因为它描述了人机交互 (HCI) 期间自然发生的状态。

+

然而，对于现实生活应用，自然数据是首选的。自然数据库可以通过在自然环境中观察和分析对象来产生。最终，自然数据库会帮助系统识别情境下的情绪，也可以用来发现交互的目标和结果。由于这类数据的自然性，可以真实自然地反映人机交互下的情感状态，也就可以应用于现实生活中的系统实现。

Despite the numerous advantages which naturalistic data has over acted data, it is difficult to obtain and usually has low emotional intensity. Moreover, data obtained in a natural context has lower signal quality, due to surroundings noise and distance of the subjects from the microphone. The first attempt to produce such database was the FAU Aibo Emotion Corpus for CEICES (Combining Efforts for Improving Automatic Classification of Emotional User States), which was developed based on a realistic context of children (age 10–13) playing with Sony's Aibo robot pet.<ref name="Steidl-2011">{{cite web | last = Steidl | first = Stefan | title = FAU Aibo Emotion Corpus | publisher = Pattern Recognition Lab | date = 5 March 2011 | url = http://www5.cs.fau.de/de/mitarbeiter/steidl-stefan/fau-aibo-emotion-corpus/ }}</ref><ref name="Scherer-2010-p243">{{harvnb|Scherer|Bänziger|Roesch|2010|p=243}}</ref> Likewise, producing one standard database for all emotional research would provide a method of evaluating and comparing different affect recognition systems.

第203行：第207行：

Despite the numerous advantages which naturalistic data has over acted data, it is difficult to obtain and usually has low emotional intensity. Moreover, data obtained in a natural context has lower signal quality, due to surroundings noise and distance of the subjects from the microphone. The first attempt to produce such database was the FAU Aibo Emotion Corpus for CEICES (Combining Efforts for Improving Automatic Classification of Emotional User States), which was developed based on a realistic context of children (age 10–13) playing with Sony's Aibo robot pet. Likewise, producing one standard database for all emotional research would provide a method of evaluating and comparing different affect recognition systems.

−

尽管自然数据比行为数据具有许多优势，但很难获得并且通常情绪强度较低。此外，由于环境噪声和对象与麦克风的距离，在自然环境中获得的数据具有较低的信号质量。第一次尝试创建这样的数据库是 FAU Aibo Emotion Corpus for ~~CEICES（Combining~~ Efforts for ~~Improvement~~ Automatic Classification of Emotional User States），它是基于儿童（10-13 岁）与索尼 Aibo 机器人宠物玩耍的真实情境开发的。同样，为所有情感研究生成一个标准数据库将提供一种评估和比较不同情感识别系统的方法。

+

尽管自然数据比表演数据具有许多优势，但很难获得并且通常情绪强度较低。此外，由于环境噪声的存在、人员与麦克风的距离较远，在自然环境中获得的数据具有较低的信号质量。埃尔朗根-纽约堡大学的AIBO情感资料库（FAU Aibo Emotion Corpus for CEICES, CEICES: Combining Efforts for Improving Automatic Classification of Emotional User States）是建立自然情感数据库的首次尝试，其采集基于10—13岁儿童与索尼AIBO宠物机器人玩耍的真实情境。同样，在情感研究领域，建立任何一个标准数据库，都需要提供评估方法，以比较不同情感识别系统的差异。

====Speech descriptors====

第209行：第213行：

====Speech descriptors====

−

= = = ~~语言描述符~~ = = =

+

==== = = 语音叙词 = = ====

The complexity of the affect recognition process increases with the number of classes (affects) and speech descriptors used within the classifier. It is, therefore, crucial to select only the most relevant features in order to assure the ability of the model to successfully identify emotions, as well as increasing the performance, which is particularly significant to real-time detection. The range of possible choices is vast, with some studies mentioning the use of over 200 distinct features.<ref name="Scherer-2010-p241"/> It is crucial to identify those that are redundant and undesirable in order to optimize the system and increase the success rate of correct emotion detection. The most common speech characteristics are categorized into the following groups.<ref name="Steidl-2011"/><ref name="Scherer-2010-p243"/>

第215行：第219行：

The complexity of the affect recognition process increases with the number of classes (affects) and speech descriptors used within the classifier. It is, therefore, crucial to select only the most relevant features in order to assure the ability of the model to successfully identify emotions, as well as increasing the performance, which is particularly significant to real-time detection. The range of possible choices is vast, with some studies mentioning the use of over 200 distinct features. It is crucial to identify those that are redundant and undesirable in order to optimize the system and increase the success rate of correct emotion detection. The most common speech characteristics are categorized into the following groups.

−

情感识别过程的复杂性随着分类器中使用的类(情感)和语音描述符的数量的增加而增加。因此，为了保证模型能够成功地识别情绪，并提高性能，只选择最相关的特征是至关重要的，这对于实时检测尤为重要。可能的选择范围很广，有些研究提到使用了200多种不同的特征【20】。识别冗余和不需要的情感信息对于优化系统、提高情感检测的成功率至关重要。最常见的言语特征可分为以下几类【24】【25】。

+

情感识别过程的复杂性随着分类器中使用的类(情感)和语音叙词的数量的增加而增加。因此，为了保证模型能够成功地识别情绪，并提高性能，只选择最相关的特征是至关重要的，这对于实时检测尤为重要。可选择范围很广，有些研究提到使用了200多种不同的特征【20】。识别冗余的情感信息对于优化系统、提高情感检测的成功率至关重要。最常见的言语特征可分为以下几类【24】【25】。

# Frequency characteristics<ref>{{Cite book |doi=10.1109/ICCCI50826.2021.9402569|isbn=978-1-7281-5875-4|chapter=Non-linear frequency warping using constant-Q transformation for speech emotion recognition|title=2021 International Conference on Computer Communication and Informatics (ICCCI)|pages=1–4|year=2021|last1=Singh|first1=Premjeet|last2=Saha|first2=Goutam|last3=Sahidullah|first3=Md|arxiv=2102.04029}}</ref>

第250行：第254行：

# 频率特性

−

* ~~重音形状-受基频变化率的影响。~~

+

* 音调形状（Accent shape ）：受基础频率变化的影响。

−

* ~~平均音调-描述说话者相对于正常语言的音调高低。~~

+

* 平均音调（Average pitch）：描述说话者相对于正常语言的音调高低。

−

* ~~曲线斜率-描述频率随时间变化的趋势，可以是上升、下降或水平。~~

+

* 音调轮廓（Contour slope）：描述频率随时间变化的趋势，可以是上升、下降或持平。

−

* ~~最后降低频率-话语结束时频率下降的幅度。~~

+

* 尾音下降（Final lowering）：一段话末尾频率下降的多少。

−

* ~~音高范围-量度一段话语的最高和最低频率之间的差距。~~

+

* 音域（Pitch range）：一段话语的最高和最低频率之间的差距。

−

* 2.~~与时间相关的特征~~:

+

* 2.时间相关特征:

−

* ~~语速-描述在一个时间单位内发出的单词或音节的频率~~

+

* 语速（Speech rate）：单位时间内发出词数或音节数。

−

* ~~重音频率-测量音高重音出现的频率~~

+

* 重音频率（Stress frequency）：重读发生的频率

−

* 3.~~语音质量参数和能量描述符~~:

+

* 3.音质参数和能量叙词:

−

* ~~呼吸质-测量语音中的吸气噪声~~

+

* 呼吸音（Breathiness）：说话中的呼吸噪声

−

* ~~亮度-描述语音中高频或低频的主导地位~~

+

* 亮度（Brilliance）：语音中高频和低频的占比

−

* ~~响度-测量语音的振幅，转换为话音的能量~~

+

* 响度（Loudness）：语音的振幅，亦为话音的能量

−

* ~~暂停间断-描述声音和静音之间的转换~~

+

* 暂停不连续性（Pause Discontinuity）：描述声音和静音之间的转换

−

* ~~音高间断-描述基本频率的转换。~~

+

* 音调不连续性（Pitch Discontinuity）：描述基本频率的转换。

===Facial affect detection===

+

=== 面部情感检测 ===

The detection and processing of facial expression are achieved through various methods such as [[optical flow]], [[hidden Markov model]]s, [[neural network|neural network processing]] or active appearance models. More than one modalities can be combined or fused (multimodal recognition, e.g. facial expressions and speech prosody,<ref name="face-prosody">{{cite conference | url = http://www.image.ece.ntua.gr/php/savepaper.php?id=447 | first1 = G. | last1 = Caridakis | first2 = L. | last2 = Malatesta | first3 = L. | last3 = Kessous | first4 = N. | last4 = Amir | first5 = A. | last5 = Raouzaiou | first6 = K. | last6 = Karpouzis | title = Modeling naturalistic affective states via facial and vocal expressions recognition | conference = International Conference on Multimodal Interfaces (ICMI'06) | location = Banff, Alberta, Canada | date = November 2–4, 2006 }}</ref> facial expressions and hand gestures,<ref name="face-gesture">{{cite book | chapter-url = http://www.image.ece.ntua.gr/php/savepaper.php?id=334 | first1 = T. | last1 = Balomenos | first2 = A. | last2 = Raouzaiou | first3 = S. | last3 = Ioannou | first4 = A. | last4 = Drosopoulos | first5 = K. | last5 = Karpouzis | first6 = S. | last6 = Kollias | chapter = Emotion Analysis in Man-Machine Interaction Systems | editor1-first = Samy | editor1-last = Bengio | editor2-first = Herve | editor2-last = Bourlard | title = Machine Learning for Multimodal Interaction | series = [[Lecture Notes in Computer Science]] | volume = 3361| year = 2004 | pages = 318–328 | publisher = [[Springer-Verlag]] }}</ref> or facial expressions with speech and text for multimodal data and metadata analysis) to provide a more robust estimation of the subject's emotional state. [[Affectiva]] is a company (co-founded by [[Rosalind Picard]] and [[Rana el Kaliouby|Rana El Kaliouby]]) directly related to affective computing and aims at investigating solutions and software for facial affect detection.

The detection and processing of facial expression are achieved through various methods such as optical flow, hidden Markov models, neural network processing or active appearance models. More than one modalities can be combined or fused (multimodal recognition, e.g. facial expressions and speech prosody, facial expressions and hand gestures, or facial expressions with speech and text for multimodal data and metadata analysis) to provide a more robust estimation of the subject's emotional state. Affectiva is a company (co-founded by Rosalind Picard and Rana El Kaliouby) directly related to affective computing and aims at investigating solutions and software for facial affect detection.

−

面部表情的检测和处理通过光流、隐马尔可夫模型、神经网络处理或主动外观模型等多种方法实现。可以组合或融合多种模态（多模态识别，例如面部表情和语音韵律【27】、面部表情和手势【28】，或用于多模态数据和元数据分析的带有语音和文本的面部表情），以提供对受试者情绪的更可靠估计状态。Affectiva 是一家与情感计算直接相关的公司(由 Rosalind Picard 和 Rana El Kaliouby 共同创办) ，旨在研究面部情感检测的解决方案和软件。

+

面部表情的检测和处理通过[[wikipedia:Optical_flow|光流]]、隐马尔可夫模型、神经网络处理或主动外观模型等多种方法实现。可以组合或融合多种模态（多模态识别，例如面部表情和语音韵律【27】、面部表情和手势【28】，或用于多模态数据和元数据分析的带有语音和文本的面部表情），以提供对受试者情绪的更可靠估计。Affectiva 是一家与情感计算直接相关的公司(由 Rosalind Picard 和 Rana El Kaliouby 共同创办) ，旨在研究面部情感检测的解决方案和软件。

==== Facial expression databases ====

+

==== 面部表情数据库 ====

Creation of an emotion database is a difficult and time-consuming task. However, database creation is an essential step in the creation of a system that will recognize human emotions. Most of the publicly available emotion databases include posed facial expressions only. In posed expression databases, the participants are asked to display different basic emotional expressions, while in spontaneous expression database, the expressions are natural. Spontaneous emotion elicitation requires significant effort in the selection of proper stimuli which can lead to a rich display of intended emotions. Secondly, the process involves tagging of emotions by trained individuals manually which makes the databases highly reliable. Since perception of expressions and their intensity is subjective in nature, the annotation by experts is essential for the purpose of validation.

第279行：第287行：

Creation of an emotion database is a difficult and time-consuming task. However, database creation is an essential step in the creation of a system that will recognize human emotions. Most of the publicly available emotion databases include posed facial expressions only. In posed expression databases, the participants are asked to display different basic emotional expressions, while in spontaneous expression database, the expressions are natural. Spontaneous emotion elicitation requires significant effort in the selection of proper stimuli which can lead to a rich display of intended emotions. Secondly, the process involves tagging of emotions by trained individuals manually which makes the databases highly reliable. Since perception of expressions and their intensity is subjective in nature, the annotation by experts is essential for the purpose of validation.

−

情感数据库的建立是一项既困难又耗时的工作。然而，创建数据库是创建识别人类情感的系统的关键步骤。大多数公开的情感数据库只包含摆出的面部表情。在姿势表情数据库中，参与者被要求展示不同的基本情绪表情，而在自发表情数据库中，表情是自然的。自发的情绪诱导需要在选择合适的刺激物时付出巨大的努力，这会导致丰富的预期情绪的展示。其次，该过程涉及由受过训练的个人手动标记情绪，这使得数据库高度可靠。由于对表达及其强度的感知本质上是主观的，专家的注释对于验证的目的是必不可少的。

+

情感数据库的建立是一项既困难又耗时的工作。然而，创建数据库是创建识别人类情感的系统的关键步骤。大多数公开的情感数据库只包含摆拍的面部表情，在这样的数据库中，参与者被要求展示不同的基本情绪表情；而在自然表情数据库中，面部表情是自发的。自然表情的发生需要选取恰当的刺激，这样才能引起目标表情的丰富展示。其次，这个过程需要受过训练的工作者为数据做标注，以实现数据库的高度可靠。因为表情及其强度的感知本质上是主观的，专家的标注对验证而言是十分重要的。

Researchers work with three types of databases, such as a database of peak expression images only, a database of image sequences portraying an emotion from neutral to its peak, and video clips with emotional annotations. Many facial expression databases have been created and made public for expression recognition purpose. Two of the widely used databases are CK+ and JAFFE.

第285行：第293行：

Researchers work with three types of databases, such as a database of peak expression images only, a database of image sequences portraying an emotion from neutral to its peak, and video clips with emotional annotations. Many facial expression databases have been created and made public for expression recognition purpose. Two of the widely used databases are CK+ and JAFFE.

−

研究人员使用三种类型的数据库，例如仅峰值表达图像的数据库、描绘从中性到峰值的情绪的图像序列数据库以及带有情绪注释的视频剪辑。面部表情数据库是面部表情识别领域的一个重要研究课题。两个广泛使用的数据库是 CK+和 JAFFE。

+

研究人员使用三种类型的数据库：峰值表情数据库、中性到峰值的情绪图像序列数据库以及带有情绪注释的视频片段。面部表情数据库是面部表情识别领域的一个重要研究课题，两个广泛使用的数据库是 CK+和 JAFFE。

====Emotion classification====

+

==== 情感分类 ====

By doing cross-cultural research in Papua New Guinea, on the Fore Tribesmen, at the end of the 1960s, [[Paul Ekman]] proposed the idea that facial expressions of emotion are not culturally determined, but universal. Thus, he suggested that they are biological in origin and can, therefore, be safely and correctly categorized.<ref name="Ekman, P. 1969"/>

第296行：第306行：

He therefore officially put forth six basic emotions, in 1972:

−

~~1960 年代末，保罗·埃克曼~~ (Paul Ekman) 在巴布亚新几内亚的 Fore Tribesmen 上进行跨文化研究，提出了一种观点，即情感的面部表情不是由文化决定的，而是普遍存在的。因此，他认为它们是起源于生物的，能够可靠地分类。因此，他在 1972 年正式提出了六种基本情绪【29】：

+

二十世纪六十年代末，保罗·埃克曼 (Paul Ekman) 在巴布亚新几内亚的 Fore Tribesmen 上进行跨文化研究，提出了一种观点，即情感的面部表情不是由文化决定的，而是普遍存在的。因此，他认为它们是起源于生物的，能够可靠地分类。因此，他在 1972 年正式提出了六种基本情绪【29】：

* [[Anger]]

点滴回忆

12

个编辑

更改

情感计算 (查看源代码)

2021年8月1日 (日) 09:41的版本

导航菜单

搜索