Difference between revisions of "Computational models of emotion"
(→References) |
|||
Line 57: | Line 57: | ||
===References=== | ===References=== | ||
− | + | 1. N. Fragopanagos and J. G. Taylor, “Emotion recognition in human–computer interaction,” Neural Netw., vol. 18, no. 4, pp. 389–405, May 2005, doi: 10.1016/j.neunet.2005.03.006. | |
− | + | 2.K. S. Rao, V. K. Saroj, S. Maity, and S. G. Koolagudi, “Recognition of emotions from video using neural network models,” Expert Syst. Appl., vol. 38, no. 10, pp. 13181–13185, Sep. 2011, doi: 10.1016/j.eswa.2011.04.129. | |
[3] G. Caridakis, K. Karpouzis, and S. Kollias, “User and context adaptive neural networks for emotion recognition,” Neurocomputing, vol. 71, no. 13–15, pp. 2553–2562, Aug. 2008, doi: 10.1016/j.neucom.2007.11.043. | [3] G. Caridakis, K. Karpouzis, and S. Kollias, “User and context adaptive neural networks for emotion recognition,” Neurocomputing, vol. 71, no. 13–15, pp. 2553–2562, Aug. 2008, doi: 10.1016/j.neucom.2007.11.043. | ||
[4] E. Batbaatar, M. Li, and K. H. Ryu, “Semantic-Emotion Neural Network for Emotion Recognition From Text,” IEEE Access, vol. 7, pp. 111866–111878, 2019, doi: 10.1109/ACCESS.2019.2934529. | [4] E. Batbaatar, M. Li, and K. H. Ryu, “Semantic-Emotion Neural Network for Emotion Recognition From Text,” IEEE Access, vol. 7, pp. 111866–111878, 2019, doi: 10.1109/ACCESS.2019.2934529. |
Revision as of 07:27, 22 October 2022
By Jenny Oh
Contents
Summary
Artificial neural networks (ANNs) have been used in emotion recognition and expression, a process known as affective computing. Affective computing has historically been harnessed to understand and explore emotion recognition and expression in human-computer (brain-computer) interfaces [1], as well as processing different forms of data (video [2], audio [3], text [4], and physiological [5] to infer or predict emotional states. Diverse ANN structures such as deep, convolutional, and recurrent neural networks have been harnessed to achieve emotion modeling. Aside from modeling human emotion, an area of interest in the literature has been in modeling artificial emotion, both of which seek to use or validate different theories of emotion to inform a computational approach. This article focuses on the use of ANN-based emotion modeling.
Background
Emotions are physical and mental states brought on by neurophysiological changes, variously associated with thoughts, feelings, behavioral responses, and a degree of pleasure or displeasure. There is no scientific consensus on a definition.
Major emotional models in psychology and physiology include the James-Lange, Cannon-bard, and Schachter-Singer theories, which implicate the physiological response and actual emotional affect in different ways. Neuroscientific approaches have largely centered on neural circuits that seem active in emotional processing, such as the Papez and Yakovlev circuits. In addition, systems of emotion in the brain such as the limbic system (basal ganglia, amygdala, and insular and cingulate cortexes) provide insight into affective processing.
Research on emotion has involved efforts from diverse fields, including psychology, neuroscience, and computer science [6]. Models and theories of emotion are a central concern in these fields. The complexity and multidisciplinary relevance in emotion makes it an apt point of investigation in convergent areas of study like computational neuroscience.
With the innovation and attention given to artificial intelligence (AI) in contemporary computer science and computational neuroscience, understanding emotion and emotion recognition has also been a major line of research that complements innovation in these fields.
Constructing Computational Models of Emotion
Computational neuroscience studies have both informed and taken cues from neuroscientific perspectives implicating specific areas of the brain including the basal ganglia [7] and striatum in constructing human emotion, as well as linking processes involved in sensorimotor, attentional, and emotional brain function [8]. The challenge in a computational model of emotion is to process and interpret emotional states in manner similar to the way humans do so, acknowledging the lack of temporal boundaries and diverse expression/perception of human emotion [9].
Various computational models of emotion have been proposed, often evidenced and informed by neurological, physiological, or psychological evidence. Combining contemporary computational methods with these insights, models often harness artificial neural networks (ANNs) and other machine learning techniques in order to analyze inputs like facial expressions, voice tone, text or language data, physiological data, and behavioral cues to generate appropriate emotional responses or to detect emotional states in users [10].
Following Picard’s (1997) idea of affective computing, defining and exploring human-computer interaction that includes emotional states, both these goals of emotion recognition and construction have helped shape computational approaches to emotion. Diverse types of ANNs and other machine learning methods, including deep neural networks (DNNs), recurrent neural networks (RNNs), and convolutional neural networks (CNNs), have been employed to recognize and construct emotion.
Deep Neural Networks (DNNs)
A subset of deep learning, deep neural networks (DNNs) are ANNs with multiple layers between the input and output layers. In modeling neural networks, the multiple layers of interconnected nodes can map neurons, and are effective for learning hierarchical data representations. This makes them useful in processing high-dimensional inputs including images and audio, as well as large datasets. In emotion recognition, these features of DNNs are used for emotion/affect sensing based on facial expression processing [11], video analysis [12], or text sentiment analysis [13]. DNNs allow pattern recognition in these areas as well as informing combined problem solving approaches across fields—for example, in audiovisual recognition or paralinguistics [14]. Using a multi-layered approach to model combined features is a major advantage in DNN-based models of emotion; aside from combining input-related features, DNNs can be used to combine different features of emotional theory (appraisal, memory, learning) [15]. Their flexible nature makes DNNs well-suited for such complex, high-dimensional tasks, drawing from multiple input types or implementing multiple layers of processing to generate meaningful representations of emotional cues. However, capturing such complex relationships through feedforward mechanisms can lead to common training issues in DNNs. Additional layers of abstraction may contribute to overfitting with smaller or less diverse datasets, while the complexity and size of larger datasets may require substantial resources and time for training. Thus, efforts in harnessing this methodology have often focused on combining multiple neural networks and employing multiple data processing techniques, resulting in adjustments to both the models and data cleaning methods used in emotion recognition [16], [17].
Recurrent Neural Networks (RNNs)
RNNs are a class of ANN designed to handle sequential data by processing data across multiple time steps (unlike feedforward neural networks. Incorporating feedback loops that allow the network to retain information from previous time steps, this makes them adept at modelling and processing text, speech, and time series. In modelling emotion, RNNs are helpful for capturing temporal features of emotion, which is helpful for emotional processing in speech patterns, behavioral changes, or physiological cues over time [18]. Capturing temporal information is valuable in emotion recognition as human behavior is often contextualized by time cues, making this approach a valuable way to harness long-range or time-dependent data. RNNs have often been combined with Long Short Term Memory (LSTM), a type of RNN architecture useful in the classification, regression, encoding, or decoding of long sequence or time-series data. LSTM-RNN is a state-of-the-art modeling technique involved in emotion recognition, and has been harnessed to generate automatic audiovisual recognition through its modeling of audio and visual features [19]. This approach allows the modelling of long range time dependencies while taking different forms of data into account. Especially due to its temporal features, RNNs are able to model emotional shifts and predict future emotional states based on past inputs, allowing for a more dynamic model of emotion. Though the vanishing gradient problem initially limited RNNs in learning long-term dependencies, combining with an LSTM architecture allows them to handle longer sequences of data as well as high-dimensional inputs. RNNs are often combined with DNN or CNN techniques to add a temporal dimension to emotion modelling.
Convolutional Neural Networks (CNNs)
A regularized type of feed-forward ANN, CNNs are able to learn features through filter optimization. CNNs have the ability to process grid-like data such as images or videos; in emotion modelling, this makes it advantageous in recognition tasks that involve visual data (such as facial expression recognition). Though image or video recognition is the main area of CNN research, they have also been used in emotion classification tasks involving other types of data, including EEG signals [20], audio/speech [21],[22], and physiological signals [23](heart rate, pulse, temperature, etc.). Often modeled with deep learning (Deep CNNs), emotion detection has been achieved using CNNs through classification of emotional features of arousal and valence. Their automatic learning features make CNNs an efficient method in processing spatial relationships within data, which is advantageous at surface level for image-based tasks but also for other domains once adapted for specific data. Though CNNs tend to be less capable in handling sequential or time-dependent data such as physiological or behavioral data, integrating RNN components with CNN architecture can mitigate this limitation. Additionally, its historical applications in medical image analysis and natural language processing [24] as well as defining brain-computer interfaces [25] in relevant fields sets precedent in using CNNs to model artificial emotion [26].
Hybrid Models and Multimodal Approaches
Most research into modeling emotion tend to integrate different model architectures to create multimodal neural networks—for example, combining CNNs for facial recognition with RNNs for speech or physiological data, allowing audiovisual emotional recognition or time-dependent and arousal/valence-based physiological emotional recognition [27]. These hybrid models can improve accuracy in emotion detection by integrating complementary sources of information. Though these architectures may require more synchronization or alignment of data sources, as well as larger or more diverse training datasets to decrease error, their integration may prove more reliable in creating more accurate and more nuanced representations of human emotion.
Models of Artificial Emotion Emotions have been used to inform the construction of AI systems, agents, and robots. The challenge of modeling self-awareness in AI [28] is something that the “understanding,” “feeling,” or “expressing” of emotion in AI may mitigate. Though some projects have endeavored to realize emotions in robots, the issue seems to be in the feeling/functioning of, rather than the acting out of, emotions [29]. This issue calls for computational models of emotion, but most deep learning or ANN-based models are focused on the bottom-up models of emotion processing rather than a top-down emotional system.
Applications of Computational Models of Emotion
Affective computing has informed different applications of such computational emotion classification methods. As most ANN-based models of emotion largely focus on similar classification or recognition tasks, generating emotional predictions from different kinds of input data, their applications often involve monitoring or predicting human emotional states.
For example, emotion recognition in video data can be used to monitor and predict driver fatigue state in real-time based on video images [30]. Similar video-based emotional monitoring can be used to assess and analyze emotion in learning [31], patient pain [32], and seizures [33], to name a few. Similar efforts have been made to harness speech-based emotion recognition to detect emotional states of callers at call centers, allowing the detection of anger in automated voice dialogues [34]. Aside from assessing user satisfaction or customer opinion, speech emotion recognition has also been used to assess stress and affect in wearable users [35]. Likewise, text emotion recognition techniques have been utilized in customizing and understanding text-based data such as social media posts or messaging (SMS) data [36]. Aside from informing emotion or affect-related research, such technology can inform individuals’ understanding of their affect or self-expression.
Emotion recognition in speech, video, and image data, among others, have been used to develop human-computer interaction (HCI) applications [37], [38]. These may include interactive computer interfaces suited for tasks like automated transcription, eye-tracking technology, or speech recognition, which contribute to everyday tools such as digital assistants, AR/VR technology, or digital sensors, as well as relevant research in the field.
Additionally, emotion models have been employed in mental health monitoring tools to automatically detect signs of emotional distress, temporal downward trends in affect, or anxiety/depression symptoms from speech [39], text [40], or facial expression data [41], offering potential for early intervention and personalized treatment [42].
As automated and personalized data collection and generation become more ubiquitous, computational models of emotion may become more relevant and widely used, continuing to aid in the development of cross-disciplinary research and technology.
References
1. N. Fragopanagos and J. G. Taylor, “Emotion recognition in human–computer interaction,” Neural Netw., vol. 18, no. 4, pp. 389–405, May 2005, doi: 10.1016/j.neunet.2005.03.006. 2.K. S. Rao, V. K. Saroj, S. Maity, and S. G. Koolagudi, “Recognition of emotions from video using neural network models,” Expert Syst. Appl., vol. 38, no. 10, pp. 13181–13185, Sep. 2011, doi: 10.1016/j.eswa.2011.04.129. [3] G. Caridakis, K. Karpouzis, and S. Kollias, “User and context adaptive neural networks for emotion recognition,” Neurocomputing, vol. 71, no. 13–15, pp. 2553–2562, Aug. 2008, doi: 10.1016/j.neucom.2007.11.043. [4] E. Batbaatar, M. Li, and K. H. Ryu, “Semantic-Emotion Neural Network for Emotion Recognition From Text,” IEEE Access, vol. 7, pp. 111866–111878, 2019, doi: 10.1109/ACCESS.2019.2934529. [5] T. Song, W. Zheng, P. Song, and Z. Cui, “EEG Emotion Recognition Using Dynamical Graph Convolutional Neural Networks,” IEEE Trans. Affect. Comput., vol. 11, no. 3, pp. 532–541, Jul. 2020, doi: 10.1109/TAFFC.2018.2817622. [6] A. D. Lawrence, “Error correction and the basal ganglia: similar computations for action, cognition and emotion?,” Trends Cogn. Sci., vol. 4, no. 10, pp. 365–367, Oct. 2000, doi: 10.1016/S1364-6613(00)01535-7. [7] S. Koelsch et al., “The quartet theory of human emotions: An integrative and neurofunctional model,” Phys. Life Rev., vol. 13, pp. 1–27, Jun. 2015, doi: 10.1016/j.plrev.2015.03.001. [8] C.-N. Anagnostopoulos, T. Iliou, and I. Giannoukos, “Features and classifiers for emotion recognition from speech: a survey from 2000 to 2011,” Artif. Intell. Rev., vol. 43, no. 2, pp. 155–177, Feb. 2015, doi: 10.1007/s10462-012-9368-5. [9] Y. Wang et al., “A systematic review on affective computing: emotion models, databases, and recent advances,” Inf. Fusion, vol. 83–84, pp. 19–52, Jul. 2022, doi: 10.1016/j.inffus.2022.03.009. [10] N. Jain, S. Kumar, A. Kumar, P. Shamsolmoali, and M. Zareapoor, “Hybrid deep neural networks for face emotion recognition,” Pattern Recognit. Lett., vol. 115, pp. 101–106, Nov. 2018, doi: 10.1016/j.patrec.2018.04.010. [11] P. Khorrami, T. Le Paine, K. Brady, C. Dagli, and T. S. Huang, “How deep neural networks can improve emotion recognition on video data,” in 2016 IEEE International Conference on Image Processing (ICIP), Phoenix, AZ, USA: IEEE, Sep. 2016, pp. 619–623. doi: 10.1109/ICIP.2016.7532431. [12] P. Tzirakis, G. Trigeorgis, M. A. Nicolaou, B. W. Schuller, and S. Zafeiriou, “End-to-End Multimodal Emotion Recognition Using Deep Neural Networks,” IEEE J. Sel. Top. Signal Process., vol. 11, no. 8, pp. 1301–1309, Dec. 2017, doi: 10.1109/JSTSP.2017.2764438. [13] C. Hieida, T. Horii, and T. Nagai, “Deep Emotion: A Computational Model of Emotion Using Deep Neural Networks,” 2018, arXiv. doi: 10.48550/ARXIV.1808.08447. [14] S. E. Kahou et al., “Combining modality specific deep neural networks for emotion recognition in video,” in Proceedings of the 15th ACM on International conference on multimodal interaction, Sydney Australia: ACM, Dec. 2013, pp. 543–550. doi: 10.1145/2522848.2531745. [15] M. Liu, R. Wang, S. Li, S. Shan, Z. Huang, and X. Chen, “Combining Multiple Kernel Methods on Riemannian Manifold for Emotion Recognition in the Wild,” in Proceedings of the 16th International Conference on Multimodal Interaction, Istanbul Turkey: ACM, Nov. 2014, pp. 494–501. doi: 10.1145/2663204.2666274. [16] R. J. Williams and D. Zipser, “A Learning Algorithm for Continually Running Fully Recurrent Neural Networks,” Neural Comput., vol. 1, no. 2, pp. 270–280, Jun. 1989, doi: 10.1162/neco.1989.1.2.270. [17] M. Wöllmer, M. Kaiser, F. Eyben, B. Schuller, and G. Rigoll, “LSTM-Modeling of continuous emotions in an audiovisual affect recognition framework,” Image Vis. Comput., vol. 31, no. 2, pp. 153–163, Feb. 2013, doi: 10.1016/j.imavis.2012.03.001. [18] D. Bertero and P. Fung, “A first look into a Convolutional Neural Network for speech emotion detection,” in 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA: IEEE, Mar. 2017, pp. 5115–5119. doi: 10.1109/ICASSP.2017.7953131. [19] D. Issa, M. Fatih Demirci, and A. Yazici, “Speech emotion recognition with deep convolutional neural networks,” Biomed. Signal Process. Control, vol. 59, p. 101894, May 2020, doi: 10.1016/j.bspc.2020.101894. [20] L. Santamaria-Granados, M. Munoz-Organero, G. Ramirez-Gonzalez, E. Abdulhay, and N. Arunkumar, “Using Deep Convolutional Neural Network for Emotion Detection on a Physiological Signals Dataset (AMIGOS),” IEEE Access, vol. 7, pp. 57–67, 2019, doi: 10.1109/ACCESS.2018.2883213. [21] R. Collobert and J. Weston, “A unified architecture for natural language processing: deep neural networks with multitask learning,” in Proceedings of the 25th international conference on Machine learning - ICML ’08, Helsinki, Finland: ACM Press, 2008, pp. 160–167. doi: 10.1145/1390156.1390177. [22] O. Avilov, S. Rimbert, A. Popov, and L. Bougrain, “Deep Learning Techniques to Improve Intraoperative Awareness Detection from Electroencephalographic Signals,” in 2020 42nd Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC), Montreal, QC, Canada: IEEE, Jul. 2020, pp. 142–145. doi: 10.1109/EMBC44109.2020.9176228. [23] Z. Kowalczuk and M. Czubenko, “Computational Approaches to Modeling Artificial Emotion – An Overview of the Proposed Solutions,” Front. Robot. AI, vol. 3, Apr. 2016, doi: 10.3389/frobt.2016.00021. [24] H. Hodson, “I know it’s me talking,” New Sci., vol. 227, no. 3030, p. 18, Jul. 2015, doi: 10.1016/S0262-4079(15)30777-6. [25] Q. Ji, Z. Zhu, and P. Lan, “Real-Time Nonintrusive Monitoring and Prediction of Driver Fatigue,” IEEE Trans. Veh. Technol., vol. 53, no. 4, pp. 1052–1068, Jul. 2004, doi: 10.1109/TVT.2004.830974. [26] A. V. Savchenko and I. A. Makarov, “Neural Network Model for Video-Based Analysis of Student’s Emotions in E-Learning,” Opt. Mem. Neural Netw., vol. 31, no. 3, pp. 237–244, Sep. 2022, doi: 10.3103/S1060992X22030055. [27] V. Pandit, M. Schmitt, N. Cummins, and B. Schuller, “I see it in your eyes: Training the shallowest-possible CNN to recognise emotions and pain from muted web-assisted in-the-wild video-chats in real-time,” Inf. Process. Manag., vol. 57, no. 6, p. 102347, Nov. 2020, doi: 10.1016/j.ipm.2020.102347. [28] J.-C. Hou, M. Thonnat, F. Bartolomei, and A. McGonigal, “Automated video analysis of emotion and dystonia in epileptic seizures,” Epilepsy Res., vol. 184, p. 106953, Aug. 2022, doi: 10.1016/j.eplepsyres.2022.106953. [29] P. Schmidt, A. Reiss, R. Duerichen, C. Marberger, and K. Van Laerhoven, “Introducing WESAD, a Multimodal Dataset for Wearable Stress and Affect Detection,” in Proceedings of the 20th ACM International Conference on Multimodal Interaction, Boulder CO USA: ACM, Oct. 2018, pp. 400–408. doi: 10.1145/3242969.3242985. [30] J. Deng and F. Ren, “A Survey of Textual Emotion Recognition and Its Challenges,” IEEE Trans. Affect. Comput., vol. 14, no. 1, pp. 49–67, Jan. 2023, doi: 10.1109/TAFFC.2021.3053275. [31] R. Cowie et al., “Emotion recognition in human-computer interaction,” IEEE Signal Process. Mag., vol. 18, no. 1, pp. 32–80, Jan. 2001, doi: 10.1109/79.911197. [32] N. Elsayed, Z. ElSayed, N. Asadizanjani, M. Ozer, A. Abdelgawad, and M. Bayoumi, “Speech Emotion Recognition using Supervised Deep Recurrent System for Mental Health Monitoring,” in 2022 IEEE 8th World Forum on Internet of Things (WF-IoT), Yokohama, Japan: IEEE, Oct. 2022, pp. 1–6. doi: 10.1109/WF-IoT54382.2022.10152117. [33] K. Dheeraj and T. Ramakrishnudu, “Negative emotions detection on online mental-health related patients texts using the deep learning with MHA-BCNN model,” Expert Syst. Appl., vol. 182, p. 115265, Nov. 2021, doi: 10.1016/j.eswa.2021.115265. [34] Z. Fei et al., “Deep convolution network based emotion analysis towards mental health care,” Neurocomputing, vol. 388, pp. 212–227, May 2020, doi: 10.1016/j.neucom.2020.01.034. [35] M. Chen, X. Liang, and Y. Xu, “Construction and Analysis of Emotion Recognition and Psychotherapy System of College Students under Convolutional Neural Network and Interactive Technology,” Comput. Intell. Neurosci., vol. 2022, pp. 1–11, Sep. 2022, doi: 10.1155/2022/5993839.