Yazar "Yildirim, Serdar" seçeneğine göre listele
Listeleniyor 1 - 15 / 15
Sayfa Başına Sonuç
Sıralama seçenekleri
Öğe Anger recognition in Turkish speech using acoustic information(2012) Oflazoglu, Ça?lar; Yildirim, SerdarAn emerging trend in human-computer interaction technology is to design spoken interfaces that facilitate more natural interaction between a user and a computer. Being able to detect the user's affective state during interaction is one of the key steps toward implementing such interfaces. In this study, anger recognition from Turkish speech using acoustic information is explored. The relative importance of acoustic feature categories in anger recognition is examined. Results show that logarithmic power of Mel-frequency bands, mel frequency cepstral coefficients and perceptual linear predictive coefficients are relatively more important than other acoustic categories in the context of anger recognition. Results also show that unweighted recall of 75.8% is obtained when correlation based feature selection method and Naive Bayes classifier are used. © 2012 IEEE.Öğe Automatic Detection of Disfluency Boundaries in Spontaneous Speech of Children Using Audio-Visual Information(Ieee-Inst Electrical Electronics Engineers Inc, 2009) Yildirim, Serdar; Narayanan, ShrikanthThe presence of disfluencies in spontaneous speech, while poses a challenge for robust automatic recognition, also offers means for gaining additional insights into understanding a speaker's communicative and cognitive state. This paper analyzes disfluencies in children's spontaneous speech, in the context of spoken dialog based computer game play, and addresses the automatic detection of disfluency boundaries. Although several approaches have been proposed to detect disfluencies in speech, relatively little work has been done to utilize visual information to improve the performance and robustness of the disfluency detection system. This paper describes the use of visual information along with prosodic and language information to detect the presence of disfluencies in a child's computer-directed speech and shows how these information sources can be integrated to increase the overall information available for disfluency detection. The experimental results on our children's multimodal dialog corpus indicate that disfluency detection accuracy of over 80% can be obtained by utilizing audio-visual information. Specifically, results showed that the addition of visual information to prosody and language features yield relative improvements in disfluency detection error rates of 3.6% and 6.3%, respectively, for information fusion at the feature level and decision level.Öğe Binary Classification Performances of Emotion Classes for Turkish Emotional Speech(Ieee, 2015) Oflazoglu, Caglar; Yildirim, SerdarEmotion recognition from speech plays important role for natural human-computer interaction. This study investigates binary classification performances of 4 fundamental emotion classes in Turkish Emotional Speech (TurES) Database using acoustic features for various classifiers. Results shows that Angry emotion class has higher classification rate (70%-80%) than others; lowest classification rate is obtained as 64% for Happy-Neutral emotion pair. Best classification results are obtained with J48 (C4.5) classifier for all emotion pairs.Öğe Classification of Emotion Primitives from EEG Signals Using Visual and Audio Stimuli(Ieee, 2015) Dasdemir, Yasar; Yildirim, Serdar; Yildirim, EsenEmotion recognition from EEG signals has an important role in designing Brain-Computer Interface. This paper compares effects of audio and visual stimuli, used for collecting emotional EEG signals, on emotion classification performance. For this purpose EEG data from 25 subjects are collected and binary classification (low/high) for valence and activation emotion dimensions are performed. Wavelet transform is used for feature extraction and 3 classifiers are used for classification. True positive rates of 71.7% and 78.5% are obtained using audio and video stimuli for valence dimension 71% and 82% are obtained using audio and video stimuli for arousal dimension, respectively.Öğe Classification of Emotional Valence Dimension Using Artificial Neural Networks(Ieee, 2015) Ozdemir, Merve Erkmay; Yildirim, Esen; Yildirim, SerdarEmotions play an important role in human interaction. Emotion recognition should be considered to design an effective Brain-Computer Interface. In this work binary classification (low/high) for valence which is one of the primitives used in expressing emotions is performed. Hilbert-Huang Transform is used for feature extraction, multi layer feed forward Artificial Neural Networks is used for subject independent classification and 69% of true positive rate is obtained.Öğe Detecting emotional state of a child in a conversational computer game(Academic Press Ltd- Elsevier Science Ltd, 2011) Yildirim, Serdar; Narayanan, Shrikanth; Potamianos, AlexandrosThe automatic recognition of user's communicative style within a spoken dialog system framework, including the affective aspects, has received increased attention in the past few years. For dialog systems, it is important to know not only what was said but also how something was communicated, so that the system can engage the user in a richer and more natural interaction. This paper addresses the problem of automatically detecting frustration, politeness, and neutral attitudes from a child's speech communication cues, elicited in spontaneous dialog interactions with computer characters. Several information sources such as acoustic, lexical, and contextual features, as well as, their combinations are used for this purpose. The study is based on a Wizard-of-Oz dialog corpus of 103 children, 7-14 years of age, playing a voice activated computer game. Three-way classification experiments, as well as, pairwise classification between polite vs. others and frustrated vs. others were performed. Experimental results show that lexical information has more discriminative power than acoustic and contextual cues for detection of politeness, whereas context and acoustic features perform best for frustration detection. Furthermore, the fusion of acoustic, lexical and contextual information provided significantly better classification results. Results also showed that classification performance varies with age and gender. Specifically, for the politeness detection task, higher classification accuracy was achieved for females and 10-11 years-olds, compared to males and other age groups, respectively. (C) 2010 Elsevier Ltd. All rights reserved.Öğe Emotion estimation from EEG signals using wavelet transform analysis(2012) Uzun, Süheyla Sinem; Oflazoglu, Ça?lar; Yildirim, Serdar; Yildirim, EsenEmotion recognition is important for an effective human-machine interaction. Information obtained from speech, gestures and mimics, heart rate, and temperature can be used in emotion estimation. In this study, emotion estimation from EEG signals using wavelet decomposition is performed. For this purpose, EEG signals were recorded from 20 subjects and audio stimuli are used to evoke emotions. Delta, Theta, Alfa, Beta and Gamma sub-bands of signals are computed using wavelet transform. Statistical features and energy of each band are computed. Correlation based feature selection algorithm is applied to the base feature set to obtain the most relevant subset and emotion primitives are estimated using Support Vector Regression. Emotion estimation results in terms of mean absolute error using db4, db8 and coif5 mother wavelets are 0.28, 0.26, and 0.29 for valence, 0.20, 0.20, and 0.19 for activation and 0.11, 0.10, and 0.10 for dominance respectively. © 2012 IEEE.Öğe Emotion primitives estimation from EEG signals using Hilbert Huang Transform(2012) Uzun, S. Sinem; Yildirim, Serdar; Yildirim, EsenThis paper addresses the problem of emotion primitives estimation using information obtained from EEG signals. The EEG data were collected from 18 subjects, 9 male and 9 female, aged from 19 to 26 years old. We used audio clips from International Affective Digital Sounds (IADS) as stimuli for emotion elicitation. Hilbert-Huang Transform, a proper method for non-linear and non-stationary signal processing, was used for feature extraction. EEG signals were first decomposed into their Intrinsic Mode Functions (IMFs). Then 990 features were computed from the first five IMFs. To identify the most salient features and eliminate the redundant and irrelevant ones, we performed correlation based feature selection (CFS). This feature selection process reduced the number of features dramatically while increasing the performance remarkably. In this work, we used support vector regression for estimation of each emotion primitive value. Regression mean absolute error values and their standard deviations over all subjects for valence, activation, and dominance were obtained as 1.11 (0.13), 0.65 (0.09) and 0.38 (0.06) respectively. © 2012 IEEE.Öğe Emotion Recognition From Speech Using Fisher's Discriminant Analysis and Bayesian Classifier(Ieee, 2015) Atasoy, Huseyin; Yildirim, Serdar; Yildirim, EsenIn this study, a large number of features that were obtained to classify speech emotions were projected into different spaces, selecting different numbers of principal components in principal component analysis and Fisher's discriminant analysis. Classifications were performed in those spaces using Naive-Bayes classifier and obtained results were compared. While the highest accuracy obtained in the Fisher space was 57.87%, it was calculated as 48.02% in the principal component space.Öğe Recognizing child's emotional state in problem-solving child-machine interactions(2009) Yildirim, Serdar; Narayanan, ShrikanthThe need for automatic recognition of a speaker's emotion within a spoken dialog system framework has received increased attention with demand for computer interfaces that provide natural and user-adaptive spoken interaction. This paper addresses the problem of automatically recognizing a child's emotional state using information obtained from audio and video signals. The study is based on a multimodal data corpus consisting of spontaneous conversations between a child and a computer agent. Four different techniques - k-nearest neighborhood (k-NN) classifier, decision tree, linear discriminant classifier (LDC), and support vector machine classifier (SVC) - were employed for classifying utterances into 2 emotion classes, negative and non-negative, for both acoustic and visual information. Experimental results show that, overall, combining visual information with acoustic information leads to performance improvements in emotion recognition. We obtained the best results when information sources were combined at feature level. Specifically, results showed that the addition of visual information to acoustic information yields relative improvements in emotion recognition of 3.8% with both LDC and SVC classifiers for information fusion at the feature level over that of using only acoustic information. Copyright 2009 ACM.Öğe Recognizing emotion from Turkish speech using acoustic features(Springer, 2013) Oflazoglu, Caglar; Yildirim, SerdarAffective computing, especially from speech, is one of the key steps toward building more natural and effective human-machine interaction. In recent years, several emotional speech corpora in different languages have been collected; however, Turkish is not among the languages that have been investigated in the context of emotion recognition. For this purpose, a new Turkish emotional speech database, which includes 5,100 utterances extracted from 55 Turkish movies, was constructed. Each utterance in the database is labeled with emotion categories (happy, surprised, sad, angry, fearful, neutral, and others) and three-dimensional emotional space (valence, activation, and dominance). We performed classification of four basic emotion classes (neutral, sad, happy, and angry) and estimation of emotion primitives using acoustic features. The importance of acoustic features in estimating the emotion primitive values and in classifying emotions into categories was also investigated. An unweighted average recall of 45.5% was obtained for the classification. For emotion dimension estimation, we obtained promising results for activation and dominance dimensions. For valence, however, the correlation between the averaged ratings of the evaluators and the estimates was low. The cross-corpus training and testing also showed good results for activation and dominance dimensions.Öğe Sinter Machine Speed Control Based on Thermal Control(Ieee, 2015) Beskardes, Ahmet; Ozdemir, Merve Erkinay; Yildirim, SerdarIn this study, via temperature datas of Isdemir's Sinter Machine, different from sintering methodologies literature, Burning Rising Point (BRP) is calculated and, based on these calculation datas, speed of sinter machine is controlled by using artificial neural networks. Human dependent sinter machine's speed control is done, with %87 accuracy, independent from human control.Öğe Turkish emotional speech database(2011) Oflazoglu, Ça?lar; Yildirim, SerdarSuccess of an emotion recognition systems from speech signal is directly dependent to the database used in system modeling as in any pattern recognition problem. In this work, we give the detailed description of the new Turkish emotional speech database we created. Our database consists of 5304 speech signal and their textual contents extracted from 55 Turkish movies. Speech signals are labeled by numerous evaluators both categorically (happy, surprised, sad, angry, fear, neutral and other) and in 3-dimensional emotional space (valence, activation and dominance). We believe that our database will be very helpful in further studies on acoustical analysis of Turkish emotional speech and emotion recognition. © 2011 IEEE.Öğe Using interval type-2 fuzzy logic to analyze Turkish emotion words(2012) Cakmak, Ozan; Kazemzadeh, Abe; Yildirim, Serdar; Narayanan, ShriThis paper describes a methodology that shows the feasibility of a fuzzy logic (FL) representation of Turkish emotion-related words. We analyzed 197 Turkish emotion words set through a web-based surveys that prompted users with emotional words and asked them to enter an interval valence, activation, and dominance emotion attributes using a double slider. Our previous experimental results indicated that there was a strong correlation between the emotions attributed to Turkish word roots and the Turkish sentences. In this paper, we extend our previous work and analyze Turkish emotion words by using an interval type-2 fuzzy logic. © 2012 APSIPA.Öğe Using Interval Type-2 Fuzzy Logic to Analyze Turkish Emotion Words(Ieee, 2012) Cakmak, Ozan; Kazemzadeh, Abe; Yildirim, Serdar; Narayanan, ShriThis paper describes a methodology that shows the feasibility of a fuzzy logic (FL) representation of Turkish emotion-related words. We analyzed 197 Turkish emotion words set through a web-based surveys that prompted users with emotional words and asked them to enter an interval valence, activation, and dominance emotion attributes using a double slider. Our previous experimental results indicated that there was a strong correlation between the emotions attributed to Turkish word roots and the Turkish sentences. In this paper, we extend our previous work and analyze Turkish emotion words by using an interval type-2 fuzzy logic.