#1: Learning by Reading: An Experiment in Text Analysis
Eduard Hovy (ISI, University of Southern California, USA)
It has long been a dream to build computer systems that learn automatically by reading text. This dream is generally considered infeasible, but some surprising developments in the US over the past three years have led to the funding of several short-term investigations into whether and how much the best current practices in Natural Language Processing and Knowledge Representation and Reasoning, when combined, actually enable this dream. This paper very briefly describes one of these efforts, the Learning by Reading project at ISI, which has converted a high school textbook of Chemistry into very shallow logical form and is investigating which semantic features can plausibly be added to support the kinds of inference required for answering standard high school text questions.
#2: Depth of Feelings: Alternatives for Modeling Affect in User Models
Eva Hudlicka (Psychometrix Associates, Blacksburg, USA)
Neuroscience and psychology research has demonstrated a close connection between cognition and affect, and a number of emotion-induced effects on perception, cognition, and behavior. The integration of emotions within user models would therefore enhance their realism and fidelity. Emotions can also provide disambiguating information for speech recognition and natural language understanding, and enhance the effectiveness of dialogue systems. This paper discusses the motivation and alternatives for incorporating emotions within user models. The paper first identifies key model characteristics that define an analytical framework. The framework provides a basis for identifying the functional and architectural requirements on one hand, and alternative modeling approaches on the other, thereby laying the groundwork for a set of model development guidelines. The paper then describes examples of existing models for two core affective processes, cognitive appraisal and emotion-induced effects on cognition, within the context of the analytical framework.
#3: Discovering Anomalous Material
Louise Guthrie (University of Sheffield, UK)
This paper describes some of our work on a Ministry of Defense funded project to detect language use which is 'unusual' in some way. We will describe our work in identifying an anomalous document in a collection of documents and our work in identifying an anomalous segment within a document. We have designed thousands of test collections to allow the techniques we have used to be evaluated for several dimensions of anomaly: an anomalous author, genre, topic, or emotional tone. We will describe both supervised (where training data is available for the 'normal' population) and unsupervised techniques (where no training data is available). In the supervised methods, our interest is in whether or not to attempt to represent the 'non-normal' population, and if so, how. In the unsupervised methods, we use several hundred stylistic features to rank the segments of a document as to their degree of anomaly. Experiments vary the size of the segments (100 word, 500 word, and 1000 word), and the similarity measures used. Overall results will be compared and analyzed.
#4: Unifying Semantic Annotations for Linguistic Description
James Pustejovsky (Brandeis University, USA)
Most recent annotation efforts for language have focused on small pieces of the larger problem of semantic annotation, rather than producing a single unified representation. In this talk, I investigate the issues involved in merging several of these efforts into a unified linguistic structure: specifically, PropBank, NomBank, TimeBank, Discourse Treebank, and Opinion Corpus. Each of these is focused on a specific aspect of the semantic representation task: semantic role labeling, discourse relations, temporal relations, etc., and has reached a level of maturity that warrants a concerted effort to merge them into a single, unified representation, what I will refer to as a Unified Linguistic Annotation (ULA). There are several technical and theoretical issues that must be resolved to bring these different layers together seamlessly. Most of these approaches have annotated the same type of data (Wall Street Journal), so it is also important to demonstrate that the annotation can be extended to other genres such as spoken language. The demonstration of success for the extensions is the training of accurate statistical semantic taggers.
#19: Environmental Adaptation with a Small Data Set of the Target Domain
Andreas Maier, Tino Haderlein, Elmar N{ö
In this work we present an approach to adapt speaker-independent recognizers to a new acoustical environment. The recognizers were trained with data which were recorded using a close-talking microphone. These recognizers are to be evaluated with distant-talking microphone data. The adaptation set was recorded with the same type of microphone. In order to keep the speaker-independency this set includes 33 speakers. The adaptation itself is done using maximum a posteriori (MAP) and maximum likelihood linear regression adaptation (MLLR) in combination with the Baum-Welch algorithm. Furthermore the close-talking training data were artificially reverberated to reduce the mismatch between training and test data. In this manner the performance could be increased from 9.9 % WA to 40.0 % WA in speaker-open conditions. If further speaker-dependent adaptation is applied this rate is increased up to 54.9 % WA.
#23: Visualization of Voice Disorders Using the Sammon Transform
Tino Haderlein, Dominik Zorn, Stefan Steidl, Elmar N{ö
The Sammon Transform performs data projections in a topology-preserving manner on the basis of an arbitrary distance measure. We use the weights of the observation probabilities of semi-continuous HMMs that were adapted to the current speaker as input. Experiments on laryngectomized speakers with tracheoesophageal substitute voice, hoarse, and normal speakers show encouraging results. Different speaker groups are separated in 2-D space, and the projection of a new speaker into the Sammon map allows prediction of his or her kind of voice pathology. The method can thus be used as an objective, automated support for the evaluation of voice disorders, and it visualizes them in a way that is convenient for speech therapists.
#24: Korean Numeral Classifier Constructions
Jong-Bok Kim (Kyung Hee University, Seoul, Korea), Jaehyung Yang (Kangnam University, Korea)
The syntactic and semantic complexity of the so-called numeral classifier (Num-Cl) constructions in Korean challenges theoretical as well as computational linguists. We provide a constraint-based analysis of these constructions within the framework of HPSG with the semantic representations of MRS (Minimal Recursion Semantics) and reports its implementation in the LKB (Linguistic Knowledge Building) system.
#25: A Hybrid Model for Extracting Transliteration Equivalents from Parallel Corpora
Jong-Hoon Oh (NICT, Republic of Korea), Key-Sun Choi (KAIST, Republic of Korea), Hitoshi Isahara (NICT, Republic of Korea)
Several models for transliteration pair acquisition have been proposed to overcome the out-of-vocabulary problem caused by transliterations. To date, however, there has been little literature regarding a framework that can accommodate several models at the same time. Moreover, there is little concern for validating acquired transliteration pairs using up-to-date corpora, such as web documents. To address these problems, we propose a hybrid model for transliteration pair acquisition. In this paper, we concentrate on a framework for combining several models for transliteration pair acquisition. Experiments showed that our hybrid model was more effective than each individual transliteration pair acquisition model alone.
#26: Head Internal and External Relative Clause Constructions in Korean
Jong-Bok Kim (Kyung Hee University, Seoul, Korea)
Korean displays various types of relative clauses including head internal and external relative clauses (HIRC and HERC). In particular, the treatment of HIRC has received less attention from computational perspectives even though it is frequently found in both text and spoken languages. This paper shows that a typed feature structure grammar of HPSG (together with the semantic representations of Minimal Recursion Semantics) offers us a computationally feasible and applicable way of deep-parsing both the HIRC and HERC in the language.
#28: Explicative Document Reading Controlled by Non-speech Audio Gestures
Adam J. Sporka (Czech Technical University in Prague, Czech Republic), Pavel Zikovsky (Academy of Performing Arts in Prague, Czech Republic), Pavel Slavik (Czech Technical University in Prague, Czech Republic)
There are many situations in which listening to a text produced by a text-to-speech system is easier or safer than reading, for example when driving a car. Technical documents, such as conference articles, manuals etc., usually are comprised of relatively plain and unequivocal sentences. These documents usually contain words and terms unknown to the listener because they are full of domain specific terminology. In this paper, we propose a system that allows the users to interrupt the reading upon hearing an unknown or confusing term by a non-speech acoustic gesture (e.g. "{uhm?}"). Upon this interruption, the system provides a definition of the term, retrieved from Wikipedia, the Free Encyclopedia. The selection of the non-speech gestures has been made with a respect to the cross-cultural applicability and language independence. In this paper we present a set of novel tools enabling this kind of interaction.
#35: Speech Coding Based on Spectral Dynamics
Petr Motlícek (IDIAP, Switzerland & Brno University of Technology, Czech Republic), Hynek Hermansky (IDIAP, Switzerland & Brno University of Technology, Czech Republic & EPFL, Switzerland), Harinath Garudadri, Naveen Srinivasamurthy (Qualcomm Inc., San Diego, USA)
In this paper we present first experimental results with a novel audio coding technique based on approximating Hilbert envelopes of relatively long segments of audio signal in critical-band-sized sub-bands by autoregressive model. We exploit the generalized autocorrelation linear predictive technique that allows for a better control of fitting the peaks and troughs of the envelope in the sub-band. Despite introducing longer algorithmic delay, improved coding efficiency is achieved. Since the described technique does not directly model short-term spectral envelopes of the signal, it is suitable not only for coding speech but also for coding of other audio signals.
#38: Speech Use in a Remote Monitoring System for Health Care
Michel Vacher, Jean-François Serignat, Stéphane Chaillol, Dan Istrate, Vladimir Popescu (CLIPS-IMAG, Grenoble, France)
Ageing affects the economic and social foundations of societies at world level. Health care has to respond to the challenge that population ageing presents. Medical remote monitoring needs human operator to be assisted by means of smart information systems. Physiological and position sensors give numerous data, but speech analysis and sound classification can give interesting additional information about the patient and may help in decision-making. The entire analysis system is composed of parallel tasks: signal detection and channel selection, sound/speech classification, life sound classification and speech recognition. The multichannel sound processing allows us to localize the source of sound in the apartment and to select appropriate signal segments for analysis. Recognized key words indicative of a distress situation are extracted from sentences. Key words and classification results are sent to the medical remote monitoring application through network. An adapted speech corpus was recorded in French and used for evaluation purposes.
#44: Sentence Compression Using Statistical Information about Dependency Path Length
Kiwamu Yamagata, Satoshi Fukutomi, Kazuyuki Takagi, Kazuhiko Ozeki (The University of Electro-Communications, Tokyo, Japan)
This paper is concerned with the use of statistical information about dependency path length for sentence compression. The sentence compression method employed here requires a quantity called inter-phrase dependency strength. In the training process, original sentences are parsed, and the number of tokens is counted for each pair of phrases, connected with each other by a dependency path of certain length, that survive as a modifier-modified phrase pair in the corresponding compressed sentence in the training corpus. The statistics is exploited to estimate the inter-phrase dependency strength required in the sentence compression process. Results of subjective evaluation shows that the present method outperforms the conventional one of the same framework where the distribution of dependency distance is used to estimate the inter-phrase dependency strength.
#45: Combining Czech Dependency Parsers
Tomás Holan and Zdenek Zabokrtský (Charles University, Prague, Czech Republic)
In this paper we describe in detail two dependency parsing techniques developed and evaluated using the Prague Dependency Treebank 2.0. Then we propose two approaches for combining various existing parsers in order to obtain better accuracy. The highest parsing accuracy reported in this paper is 85.84 %, which represents 1.86 % improvement compared to the best single state-of-the-art parser. To our knowledge, no better result achieved on the same data has been published yet.
#46: A Structure of Expert System for Speaker Verification
Ales Padrta, Jan Vanek (University of West Bohemia in Pilsen, Czech Republic)
A structure of an expert system for speaker verification is introduced in this article. According to the previous research, the birth of the essential ideas leading to expert system is indicated. At first, the specifics of the speaker verification task are discussed. Then, the expert system based on the combination of the rules and an oriented graph is introduced. Finally, the benefit of this approach is tested on small knowledge base, which is focused on the signal processing. The results of performed experiments show that the proposed expert system is capable to improve the performance of the verification, although the knowledge base is really small.
#47: First Steps towards New Czech Voice Conversion System
Zdenek Hanzlicek, Jindrich Matousek (University of West Bohemia, Plzen, Czech Republic)
In this paper we deal with initial experiments on creating a new Czech voice conversion system. Voice conversion (VC) is a process which modifies the speech signal produced by one (source) speaker so that it sounds like another (target) speaker. Using VC technique a new voice for speech synthesizer can be prepared with no need to record a huge amount of new speech data. The transformation is determined using equal sentences from both speakers; these sentences are time-aligned using modified dynamic time warping algorithm. The conversion is divided into two stages corresponding to the source-filter model of speech production. Within this work we employ conversion function based on Gaussian mixture model for transforming the spectral envelope described by line spectral frequencies. Residua are converted using so called residual prediction techniques. Unlike in other similar research works, we predict residua not from the transformed spectral envelope, but directly from the source speech. Four versions of residual prediction are described and compared in this study. Objective evaluation of converted speech using performance metrics shows that our system is comparable with similar existing VC systems.
#49: A Knowledge Based Strategy for Recognising Textual Entailment
Oscar Ferrandez, Rafael M. Terol, Rafael Muñoz, Patricio Martínez-Barco, Manuel Palomar (University of Alicante, Spain)
This paper presents a knowledge based textual entailment approach comprising two stages. The first stage consists of inferring the logic forms for both the text and the hypothesis. The logic forms are obtained by analysing the dependency relations between words. The second stage carries out a comparison between the inferred logic forms by means of WordNet relations. This comparison aims at establishing the existence of an entailment relation. This approach has been evaluated within the PASCAL Second RTE Challenge and achieved 60% average precision.
#50: Post-Annotation Checking of Prague Dependency Treebank 2.0 Data
Jan Stepanek (ÚFAL UK Prague, Czech Republic)
This paper describes methods and tools used for the post-annotation checking of Prague Dependency Treebank 2.0 data. The annotation process was complicated by many factors: for example, the corpus is divided into several layers that must reflect each other; the annotation rules changed and evolved during the annotation process; some parts of the data were annotated separately and in parallel and had to be merged with the data later. The conversion of the data from the old format to a new one was another source of possible problems besides omnipresent human inadvertence. The checking procedures are classified according to several aspects, e.g. their linguistic relevance and their role in the checking process, and prominent examples are given. In the last part of the paper, the methods are compared and scored.
#53: Data-Driven Part-of-Speech Tagging of Kiswahili
Guy De Pauw (University of Antwerp, Belgium), Gilles-Maurice de Schryver (Ghent University, Belgium), Peter W. Wagacha (University of Nairobi, Kenya)
In this paper we present experiments with data-driven part-of-speech taggers trained and evaluated on the annotated Helsinki Corpus of Swahili. Using four of the current state-of-the-art data-driven taggers, TnT, MBT, SVMTool and MXPOST, we observe the latter as being the most accurate tagger for the Kiswahili dataset.We further improve on the performance of the individual taggers by combining them into a committee of taggers. We observe that the more naive combination methods, like the novel plural voting approach, outperform more elaborate schemes like cascaded classifiers and weighted voting. This paper is the first publication to present experiments on data-driven part-of-speech tagging for Kiswahili and Bantu languages in general.
#56: Multimodal Classification of the Focus of Attention
Christian Hacker (University of Erlangen-Nuremberg, Germany), Anton Batliner (University of Erlangen-Nuremberg, Germany), Elmar N{ö
Automatic dialogue systems get easily confused if speech is recognized which is not directed to the system. Besides noise or other people's conversation, even the user's utterance can cause difficulties when he is talking to someone else or to himself ("Off-Talk"). In this paper the automatic classification of the user's focus of attention is investigated. In the German SmartWeb project, a mobile device is used to get access to the semantic web. In this scenario, two modalities are provided - speech and video signal. This makes it possible to classify whether a spoken request is addressed to the system or not: with the camera of the mobile device, the user's gaze direction is detected; in the speech signal, prosodic features are analyzed. Encouraging recognition rates of up to 93 % are achieved in the speech-only condition. Further improvement is expected from the fusion of the two information sources.
#57: Automatic Annotation of Dialogues Using $n$-grams
Carlos D. Martínez-Hinarejos (Universidad Politécnica de Valencia, Spain)
The development of a dialogue system for any task implies the acquisition of a dialogue corpus in order to study the structure of the dialogues used in that task. This structure is reflected in the dialogue system behaviour, which can be rule-based or corpus-based. In the case of corpus-based dialogue systems, the behaviour is defined by statistical models which are inferred from an annotated corpus of dialogues. This annotation task is usually difficult and expensive, and therefore, automatic dialogue annotation tools are necessary to reduce the annotation effort. An automatic dialogue labeller technique that is based on $n$-grams is presented in this work. Its different variants are evaluated with respect to manual human annotations of a dialogue corpus devoted to train queries.
#58: Hungarian-English Machine Translation using GenPar
András Hócza, András Kocsor (University of Szeged, Hungary)
We present an approach for machine translation by applying the GenPar toolkit on POS-tagged and syntactically parsed texts. Our experiment in Hungarian-English machine translation is an attempt to develop prototypes of a syntax-driven machine translation system and to examine the effects of various preprocessing steps (POS-tagging, lemmatization and syntactic parsing) on system performance. The annotated monolingual texts needed for different language specific tasks were taken from the Szeged Treebank and the Penn Treebank. The parallel sentences were collected from the Hunglish Corpus. Each developed prototype runs fully automatically and new Hungarian-related functions are built in. The results are evaluated with BLEU score.
#59: Automatic Online Subtitling of the Czech Parliament Meetings
Ales Prazák, J. V. Psutka, Jan Hoidekr, Jakub Kanis, Ludek Müller, Josef Psutka (University of West Bohemia in Pilsen, Czech Republic)
This paper describes a LVCSR system for automatic online subtitling (closed captioning) of TV transmissions of the Czech Parliament meetings. The recognition system is based on Hidden Markov Models, lexical trees and bigram language model. The acoustic model is trained on 40 hours of parliament speech and the language model on more than 10M tokens of parliament speech trancriptions. The first part of the article is focused on text normalization and class-based language model preparation. The second part describes the recognition network and its decoding with respect to real-time operation demands using up to 100k vocabulary. The third part outlines the application framework allowing generation and displaying of subtitles for any audio/video source. Finally, experimental results obtained on parliament speeches with recognition accuracy varying from 80 to 95 % (according to the discussed topic) are reported and discussed.
#60: Speech Driven Facial Animation using HMMs in Basque
Maider Lehr (VICOMTech Research Centre, Donostia - San Sebastian, Spain), Andoni Arruti (University of the Basque Country, Spain), Amalia Ortiz, David Oyarzun, Michael Obach (VICOMTech Research Centre, Donostia - San Sebastian, Spain)
Nowadays, the presence of virtual characters is less and less surprising in daily life. However, there is a lack of resources and tools available in the area of visual speech technologies for minority languages. In this paper we present an application to animate in real time virtual characters from live speech in Basque. To get a realistic face animation, the lips must be synchronized with the audio. To accomplish this, we have compared different methods for obtaining the final visemes through HMM based speech recognition techniques. Finally, the implementation of a real prototype has proven the feasibility to obtain a quite natural animation in real time with a minimum amount of training data.
#61: A Pattern Learning Approach to Question Answering within the Ephyra Framework
Nico Schlaefer, Petra Gieselmann (University Karlsruhe, Germany), Thomas Schaaf, Alex Waibel (Carnegie Mellon University, Pittsburgh, USA)
This paper describes the Ephyra question answering engine, a modular and extensible framework that allows to integrate multiple approaches to question answering in one system. Our framework can be adapted to languages other than English by replacing language-specific components. It supports the two major approaches to question answering, knowledge annotation and knowledge mining. Ephyra uses the web as a data resource, but could also work with smaller corpora. In addition, we propose a novel approach to question interpretation which abstracts from the original formulation of the question. Text patterns are used to interpret a question and to extract answers from text snippets. Our system automatically learns the patterns for answer extraction, using question-answer pairs as training data. Experimental results revealed the potential of this approach.
#62: Some Methods of Describing Discontinuity in Polish and their Cost-Effectiveness
Filip Gralinski (Adam Mickiewicz University, Poznan, Poland)
The aim of this paper is to present some methods of handling discontinuity (and freer word order in general) within a medium-level grammatical framework. A context-free formalism and the "backbone" set of rules for verbal phrases are presented as the background for this paper. The main result consists in showing how discontinuous infinitive phrases and discontinuous noun phrases (interrogative phrases included) can be theoretically covered within the introduced formalism and similar grammatical frameworks. The second result reported in this paper is the cost-effectiveness analysis of introducing discontinuity rules into a medium-level grammatical framework: it turns out that attempting to cover some types of discontinuity may be unprofitable within a given grammatical framework. Although only examples from the Polish language are discussed, the described solutions are likely to be relevant for other languages with similar word order properties.
#67: The ACE Entity Detection and Recognition Task
Ying Chen, Kadri Hacioglu (University of Colorado at Boulder, USA)
In this paper, we consider the coreference resolution problem in the context of information extraction as envisioned by the DARPA Automatic Content Extraction (ACE) program. Given a set of entity mentions referring to real world entities and a similarity matrix that characterizes how similar those mentions are, we seek a set of entities that are uniquely co-referred to by those entity mentions. The quality of the clustering of entity mentions into unique entities significantly depends on the quality of (1) the similarity matrix and (2) the clustering algorithm. We explore the coreference resolution problem along those two dimensions and clearly show the tradeoff among several ways of learning similarity matrix and using it while performing clustering.
#68: Annotation of Temporal Relations within a Discourse
Petr Nemec (Charles University, Prague, Czech Republic)
In this paper we present an annotation scheme that captures general temporal relations between events expressed in a discourse. The proposed scheme aims to naturally extend the existing tectogrammatic annotation of the Prague Dependency Treebank and represents a step towards capturing the cognitive (ontological) content of a discourse. The existence of such an annotation will allow the training and testing of algorithms for automatic extraction of temporal relations which, in turn, contributes to various NLP tasks such as information retrieval and machine translation. 233 sentences of Czech translations of the Wall Street Journal (Penn Treebank) have been annotated so far. We also present statistics on the distribution of respective temporal relations based on this preliminary annotation data as well as the performance of a grammar-based algorithm.
#69: Hybrid Neural Network Design and Implementation on FPGA
I. Suaste-Rivas, A. Diaz-Mendez, C.A. Reyes-Garcia (Instituto Nacional de Astrofisica Optica y Electronica, Mexico), O.F. Reyes-Galaviz (Instituto Tecnologico de Apizaco, Mexico)
It has been found that the infant's crying has much information on its sound wave. For small infants crying is a form of communication, a very limited one, but similar to the way adults communicate. In this work we present the design of an Automatic Infant Cry Recognizer hybrid system, that classifies different kinds of cries, with the objective of identifying some pathologies in recently born babies. The system is based on the implementation of a Fuzzy Relational Neural Network (FRNN) model on a standard reconfigurable hardware like Field Programmable Gate Arrays (FPGAs). To perform the experiments, a set of crying samples is divided in two parts; the first one is used for training and the other one for testing. The input features are represented by fuzzy membership functions and the links between nodes, instead of regular weights, are represented by fuzzy relations. The training adjusts the relational weight matrix, and once its values have been adapted, the matrix is fixed into the FPGA. The goal of this research is to prove the performance of the FRNN in a development board; in this case we used the RC100 from Celoxica. The implementation process, as well as some results is shown.
#70: Syllable-Based Recognition Unit to Reduce Error Rate for Korean...
Bong-Wan Kim, Yongnam Um, Yong-Ju Lee (Wonkwang University, Korea)
In this paper we propose a new type of syllable-based unit for recognition and language model to improve recognition rate for Korean phones, syllables and characters. We propose `combined' units for which both Korean characters and syllable units realized in speech are taken into consideration. We can obtain character, syllable and phone sequences directly from the recognition results by using proposed units. To test the performance of the proposed approach we perform two types of experiments. First, we perform language modeling for phones, characters, syllables and propose combined units based on the same text corpus, and we test the performance for each unit. Second, we perform a vector space model based retrieval experiment by using the proposed combined units.
#71: A Dissonant Frequency Filtering for Enhanced Clarity of Husky Voice Signals
Sangki Kang, Yongserk Kim (Samsung Electronics Co., Korea)
In general, added noise in clean signal reduces intelligibility and degrades the performance of speech processing algorithms used for the applications such as speech compression and recognition. In this paper, a new voice clarity enhancing method using a dissonant frequency filtering (DFF) (especially $C\sharp$ and $F\sharp$ in each octave band when reference frequency is $C$) combined with noise suppression (NS) is proposed. The proposed method targets for speakers whose intelligibility became worse than normal under both noisy and noiseless environments. The test results indicate that the proposed method provides a significant audible improvement for speakers whose intelligibility is impaired and especially for the speech contaminated by the colored noise. Therefore when the filter is employed as a pre-filter for enhancing the clarity of husky voice where several types of noises are also exploited, the output speech quality and clarity can be greatly enhanced.
#73: Automatic Korean Phoneme Generation via Input-text Preprocessing and Disambiguation
Mi-young Kang, Sung-won Jung, Hyuk-Chul Kwon, Aesun Yoon (Pusan National University, Busan, South Korea)
This paper proposes an Automatic Korean Phoneme Generator (AKPG) that can be adapted to various natural language processing systems that handle raw input-text from users such as the Korean pronunciation education system. Resolving noise and ambiguity is a precondition for correct natural language processing. In order to satisfy this condition, the AKPG, as a module of an NLP system, combines linguistic and IR methods. Preprocessing modules are incorporated into the AKPG to handle spelling-errors that render correct phoneme generation impossible. In addition, the preprocessing modules convert alphanumeric symbols into Korean characters. Finally, in order to remove part-of-speech (POS) ambiguities and those of homographs with the same POS, homograph collocations are collected from a large corpus using the IR method. In addition, those homographs are integrated into dependency rules for partial parsing.
#74: Robust Speech Detection Based on Phoneme Recognition Features
France Mihelic, Janez Zibert (University of Ljubljana, Slovenia)
We introduce new method for discriminating speech and non-speech segments in audio signals based on the transcriptions produced by phoneme recognizers. Four measures based on consonant-vowels and voiced-unvoiced pairs obtained from different phonemes speech recognizers were proposed. They were constructed in a way to be recognizer and language independent and could be applied in different segmentation-classification frameworks. The segmentation systems were evaluated on different broadcast news datasets consisted of more than 60 hours of multilingual BN shows. The results of these evaluations illustrate the robustness of the proposed features in comparison to MFCC and posterior probability based features. The overall frame accuracies of the proposed approaches varied in range from 95% to 98% and remained stable through different test conditions and different phoneme recognizers.
#75: Building Korean Classifier Ontology Based on Korean WordNet
Soonhee Hwang, Youngim Jung, Aesun Yoon, Hyuk-Chul Kwon (Pusan National University, Busan, South Korea)
Being commonly used in most languages, the classifier must be reexamined using semantic classes from ontology. However, few studies have dealt with the semantic categorization of classifiers and their semantic relations to nouns, which they quantify and characterize in building ontology. In this paper, we propose the semantic recategorization of numeral classifiers in Korean and present the construction of a classifier ontology based on large corpora and KorLex 1.5 (Korean WordNet). As a result, a Korean classifier ontology containing semantic hierarchies and the relations of classifiers was constructed. This is the first Korean classifier ontology, and its size is appropriate for natural language processing. In addition, each of the individual classifiers has a connection to nouns or noun classes that are quantified by the classifiers.
#77: Exploiting the Translation Context for Multilingual WSD
Lucia Specia, Maria das Graças Volpe Nunes (Universidade de São Paulo, Brazil)
We propose a strategy to support Word Sense Disambiguation (WSD) which is designed specifically for multilingual applications, such as Machine Translation. Co-occurrence information extracted from the translation context, i.e., the set of words which have already been translated, is used to define the order in which disambiguation rules produced by a machine learning algorithm are applied. Experiments on the English-Portuguese translation of seven verbs yielded a significant improvement on the accuracy of a rule-based model: from 0.75 to 0.79.
#80: Character Identity Expression in Vocal Performance of Traditional Puppeteers
Milan Rusko, Juraj Hamar (Comenius University, Bratislava, Slovakia)
A traditional puppeteer generally uses up to a dozen different marionettes in one piece. Each of them impersonates a character with its own typical voice manifestation. It is therefore very interesting to study the techniques the puppeteer uses to change his voice and their acoustical correlates. This study becomes even more interesting when a traditional style exists that has been respected by the puppeteers for more than a century. Thus we decided to make use of the fact that there are records available of several pieces played by Bohuslav Anderle (1913-1976) and we recorded parts of the same plays played by his son, Anton Anderle (1944), supplemented by his verbal description of personality features of the characters that the actor tries to express in their voices. A summary of variety of characters one puppeteer has to master is given and the psychological, aesthetic and acoustic-phonetic aspects of their personalities are discussed. The main goal of the paper is to present a classification of voice displays of characters, techniques of voice changes, and their acoustic correlates.
#81: Analysis of HMM Temporal Evolution for Automatic Speech Recognition and Verification
Marta Casar, José A.R. Fonollosa (Universitat Politècnica de Catalunya, Barcelona, Spain)
This paper proposes a double layer speech recognition and utterance verification system based on the analysis of the temporal evolution of HMM's state scores. For the lower layer, it uses standard HMM-based acoustic modeling, followed by a Viterbi grammar-free decoding step which provides us with the state scores of the acoustic models. In the second layer, these state scores are added to the regular set of acoustic parameters, building a new set of expanded HMMs. This new paremeter models the acoustic HMM's temporal evolution. Using the expanded set of HMMs for speech recognition a significant improvement in performance is achieved. Next, we will use this new architecture for utterance verification in a "second opinion" framework. We will consign to the second layer evaluating the reliability of decoding using the acoustic models from the first layer. An outstanding improvement in performance versus a baseline verification system has been achieved with this new approach.
#82: Cascaded Grammatical Relation-Driven Parsing Using Support Vector Machines
Songwook Lee (Dongseo University, Busan, Korea)
This study aims to identify dependency structure in Korean sentences with the cascaded chunking strategy. In the first stages of the cascade, we find chunks of NP and guess grammatical relations (GRs) using Support Vector Machine (SVM) classifiers for every possible modifier-head pairs of chunks in terms of GR categories as subject, object, complement, adverbial, and etc. In the next stage, we filter out incorrect modifier-head relations in each cascade for its corresponding GR using the SVM classifiers and the characteristics of the Korean language such as distance, no-crossing and case property. Through an experiment with a tree and GR tagged corpus for training the proposed parser, we achieved an overall accuracy of 85.7% on average.
#83: Enhanced Centroid-Based Classification Technique by Filtering Outliers
Kwangcheol Shin, Ajith Abraham, SangYong Han (Chung-Ang University, Seoul, Korea)
Document clustering or unsupervised document classification has been used to enhance information retrieval. Recently this has become an intense area of research due to its practical importance. Outliers are the elements whose similarity to the centroid of the corresponding category is below some threshold value. In this paper, we show that excluding outliers from the noisy training data significantly improves the performance of the centroid-based classifier which is the best known method. The proposed method performs about 10{%} better than the centroid-based classifier.
#84: A Study of the Influence of PoS Tagging on WSD
Lorenza Moreno-Monteagudo, Rubén Izquierdo-Beviá, Patricio Martinez-Barco, Armando Suárez (Universidad de Alicante, Spain)
In this paper we discuss to what extent the choice of one particular Part-of-Speech (PoS) tagger determines the results obtained by a word sense disambiguation (WSD) system. We have chosen several PoS taggers and two WSD methods. By combining them, and using different kind of information, several experiments have been carried out. The WSD systems have been evaluated using the corpora of the lexical sample task of senseval-3 for English. The results show that some PoS taggers work better with one specific method. That is, selecting the right combination of these tools, could improve the results obtained by a WSD system.
#86: Czech-Sign Speech Corpus for Semantic Based Machine Translation
Jakub Kanis (University of West Bohemia), Jiri Zahradil (SpeechTech), Filip Jurcicek (University of West Bohemia), Ludek Müller (University of West Bohemia)
This paper describes progress in a development of the human-human dialogue corpus for machine translation of spoken language. We have chosen a semantically annotated corpus of phone calls to a train timetable information center. The phone calls consist of inquiries regarding their train traveler plans. Corpus dialogue act tags incorporate abstract semantic meaning. We have enriched a part of the corpus with Sign Speech translation and we have proposed methods how to do automatic machine translation from Czech to Sign Speech using semantic annotation contained in the corpus.
#88: Exploitation of the VerbaLex Verb Valency Lexicon in the Syntactic Analysis of Czech
Dana Hlavácková, Ales Horák, Vladimír Kadlec (Masaryk University, Brno, Czech Republic)
This paper presents an exploitation of the lexicon of verb valencies for the Czech language named VerbaLex. The VerbaLex lexicon format, called complex valency frames , comprehends all the information found in three independent electronic dictionaries of verb valency frames and it is intensively linked to the Czech WordNet semantic network. The NLP laboratory at FI MU Brno develops a deep syntactic analyzer of Czech sentences, the parsing system synt . The system is based on an efficient and fast head-driven chart parsing algorithm. We present the latest results of using the information contained in the VerbaLex lexicon as one of the language specific features used in the tree ranking algorithm for the Best Analysis Selection algorithm, which is a crucial part of the syntactic analyser of free word order languages.
#89: Use of Negative Examples in Training the HVS Semantic Model
Filip Jurcícek, Jan Svec, Jirí Zahradil, Libor Jelí nek (University of West Bohemia in Pilsen, Czech Republic)
This paper describes use of negative examples in training the HVS semantic model. We present a novel initialization of the lexical model using negative examples extracted automatically from a semantic corpus as well as description of an algorithm for extraction these examples. We evaluated the use of negative examples on a closed domain human-human train timetable dialogue corpus. We significantly improved the standard PARSEVAL scores of the baseline system. The labeled F-measure (LF) was increased from 45.4% to 49.1%.
#90: Language Modelling with Dynamic Syntax
David Tugwell (University of St Andrews, Scotland, UK)
In this paper we introduce a system for the robust analysis of English using the apporach of Dynamic Syntax, in which the syntactic process is modelled as the word-by-word construction of a semantic representation. We argue that the inherent incrementality of the approach, in contrast with the essentially static assumptions of standard generative grammar, has clear advantages for the task of language modelling. To demonstrate its potential we show that this syntactic approach consistently outperforms a standard trigram model in word recovery tasks on parsable sentences. Furthermore, these results are achieved without recourse to hand-prepared training data.
#91: Current State of Czech Text-to-Speech System ARTIC
Jindrich Matousek, Daniel Tihelka, Jan Romportl (University of West Bohemia, Pilsen, Czech Republic)
This paper gives a survey of the current state of ARTIC -- the modern Czech concatenative corpus-based text-to-speech system. All stages of the system design are described in the paper, including the acoustic unit inventory building process, text processing and speech production issues. Two versions of the system are presented: the single unit instance system with the moderate output speech quality, suitable for low-resource devices, and the multiple unit instance system with a dynamic unit instance selection scheme, yielding the output speech of a high quality. Both versions make use of the automatically designed acoustic unit inventories. In order to assure the desired prosodic characteristics of the output speech, system-version-specific prosody generation issues are discussed here too. Although the system was primarily designed for synthesis of Czech speech, ARTIC can now speak three languages: Czech (both female and male voices are available), Slovak and German.
#92: Czech Verbs of Communication and the Extraction of their Frames
Vaclava Benesova, Ondrej Bojar (Charles University, Prague, Czech Republic)
We aim at a procedure of automatic generation of valency frames for verbs not covered in VALLEX, a lexicon of Czech verbs. We exploit the classification of verbs into syntactico-semantic classes. This article describes our first step to automatically identify verbs of communication and to assign the prototypical frame to them. The method of identification is evaluated against two versions of VALLEX and FrameNet 1.2. For the purpose of frame generation, a new metric based on the notion of frame edit distance is outlined.
#93: A Pattern-based Methodology for Multimodal Interaction Design
Andreas Ratzka, Christian Wolff (University of Regensburg Germany)
This paper describes a design methodology for multimodal interactive systems. The method suggested is meant to serve as a foundation for the application of robust software engineering techniques in the field of multimodal systems. Starting from a short review of current design approaches we present a high level view of the design process for multimodal systems, highlighting design issues related to context of use factors. Our proposal is discussed in the context of a multimodal organizer which serves as our showcase application. The design of multimodal systems brings together a broad variety of analysis methods (task, context, data, user). The combination of modalities as well as the different interaction devices imply a high degree of freedom as far as design decisions are concerned. Therefore, a (simple) unification of existing approaches towards interface design like GOMS (task analysis) or Buergy's interaction constraint model for context analysis is not sufficient. We employ the design pattern approach as a means of guiding the analysis and design process. Design patterns are discussed as a general modeling tool as well as a possible approach towards designing multimodal systems.
#94: PPChecker: Plagiarism Pattern Checker in Document Copy Detection
NamOh Kang (Chung-Ang University, South Korea), Alexander Gelbukh (National Polytechnic Institute, Mexico), SangYong Han (Chung-Ang University, South Korea)
Nowadays, most of documents are produced in digital format, in which they can be easily accessed and copied. Document copy detection is a very important tool for protecting the author's copyright. We present PPChecker, a document copy detection system based on plagiarism pattern checking. PPChecker calculates the amount of data copied from the original document to the query document, based on linguistically-motivated plagiarism patterns. Experiments performed on CISI document collection show that PPChecker produces better decision information for document copy detection than existing systems.
#96: Composite Decision by Bayesian Inference in Distant-Talking Speech Recognition
Mikyong Ji, Sungtak Kim, Hoirin Kim (Information and Communications University, Daejeon, Korea)
This paper describes an integrated system to produce a composite recognition output on distant-talking speech when the recognition results from multiple microphone inputs are available. In many cases, the composite recognition result has lower error rate than any other individual output. In this work, the composite recognition result is obtained by applying Bayesian inference. The log likelihood score is assumed to follow a Gaussian distribution, at least approximately. First, the distribution of the likelihood score is estimated in the development set. Then, the confidence interval for the likelihood score is used to remove unreliable microphone channels. Finally, the area under the distribution between the likelihood score of a hypothesis and that of the (N+1)$^{st}$ hypothesis is obtained for every channel and integrated for all channels by Bayesian inference. The proposed system shows considerable performance improvement compared with the result using an ordinary method by the summation of likelihoods as well as any of the recognition results of the channels.
#99: Diphones vs. Triphones in Czech Unit Selection TTS
Daniel Tihelka, Jindrich Matousek (University of West Bohemia in Pilsen, Czech Republic)
When we started to deal with the unit selection technique in ARTIC TTS, the question of the choice of the unit type used within the system was being dealt with. Although the basic version of our TTS system is based on triphones, we decided on the use of diphones in unit selection -- mainly due to our concerns about the susceptibility of the unit selection technique to segmentation inaccuracies, and due to a limited experience with the overall system behaviour. However, we also planned to examine the possibilities of the use of triphones. As the first version of our unit selection is being built at present, this paper will examine whether the use of diphones can bring a significant advantage over the use of triphones, and whether there is a clear reason why one type of units behaves better than the other.
#100: Processing of Requests in Estonian Institutional Dialogues: Corpus Analysis
Mare Koit, Maret Valdisoo, Olga Gerassimenko (University of Tartu, Estonia), Tiit Hennoste (University of Helsinki, Finland and University of Tartu, Estonia), Riina Kasterpalu, Andriela Rääbis, Krista Strandson (University of Tartu, Estonia)
The paper analyses, how an information operator processes a customer's requests. The study is based on the Estonian dialogue corpus. Our further aim is to develop a dialogue system (DS) which interacts with a user in Estonian and recognises, interprets and grants a user's requests automatically. There are two main classes of computational models of the interpretation of dialogue acts -- cue-based and inferential-based. In this paper, we try to combine these two approaches. The corpus analysis demonstrates that a number of linguistic cues can be found which can be used by a DS for recognising requests in Estonian. The DS will use linguistic cues in order to recognise a dialogue act type. After that, a frame of the act will be activated and filled in in order to interpret (understand) the act and to generate a responding act. A simple regular grammar is used for the dialogue management.
#102: Fast Speaker Adaptation Using Multi-Stream Based Eigenvoice...
Hwa Jeon Song, Hyung Soon Kim (Pusan National University, Korea)
In this paper, the multi-stream based eigenvoice method is proposed in order to overcome the weak points of conventional eigenvoice and dimensional eigenvoice methods in fast speaker adaptation. In the proposed method, multi-streams are automatically constructed by a method of the statistical clustering analysis that uses the information acquired by correlation between dimensions. To obtain the reliable distance matrix from the covariance matrix in order to divide full dimensions into the optimal number of streams, MAP adaptation technique is employed on the covariance matrix of training data and the sample covariance of adaptation data. According to vocabulary-independent word recognition experiment with several car noise levels and supervised adaptation mode, we obtained 29{%} and 31{%} relative improvements with 5 and 50 adaptation words at 20dB SNR in comparison with conventional eigenvoice, respectively. We also obtained 26{%} and 53{%} relative improvements with 5 and 50 adaptation words at 10dB SNR, respectively.
#104: Extensive Study on Automatic Verb Sense Disambiguation in Czech
Jiri Semecky, Petr Podvesky (UFAL MFF UK, Czech Republic)
In this paper we compare automatic methods for disambiguation of verb senses, in particular we investigate Naive Bayes classifier, decision trees, and a rule-based method. Different types of features are proposed, including morphological, syntax-based, idiomatic, animacy, and WordNet-based features. We evaluate the methods together with individual feature types on two essentially different Czech corpora, VALEVAL and the Prague Dependency Treebank. The best performing methods and features are discussed.
#105: On the Behaviors of SVM ...{} in Binary Text Classification Tasks
Fabrice Colas (Leiden University, The Netherlands), Pavel Brazdil (University of Porto, Portugal)
Document classification has already been widely studied. In fact, some studies compared feature selection techniques or feature space transformation whereas some others compared the performance of different algorithms. Recently, following the rising interest towards the Support Vector Machine, various studies showed that the SVM outperforms other classification algorithms. So should we just not bother about other classification algorithms and opt always for SVM? We have decided to investigate this issue and compared SVM to $k$NN and naive Bayes on binary classification tasks. An important issue is to compare optimized versions of these algorithms, which is what we have done. Our results show all the classifiers achieved comparable performance on most problems. One surprising result is that SVM was not a clear winner, despite quite good overall performance. If a suitable preprocessing is used with $k$NN, this algorithm continues to achieve very good results and scales up well with the number of documents, which is not the case for SVM. As for naive Bayes, it also achieved good performance.
#106: Are Morphosyntactic Taggers Suitable to Improve Automatic Transcription?
Stéphane Huet, Guillaume Gravier, Pascale Sébillot (IRISA, France)
The aim of our paper is to study the interest of part of speech (POS) tagging to improve speech recognition. We first evaluate the part of misrecognized words that can be corrected using POS information; the analysis of a short extract of French radio broadcast news shows that an absolute decrease of the word error rate by 1.1% can be expected. We also demonstrate quantitatively that traditional POS taggers are reliable when applied to spoken corpus, including automatic transcriptions. This new result enables us to effectively use POS tag knowledge to improve, in a postprocessing stage, the quality of transcriptions, especially correcting agreement errors.
#107: Synthesis of Czech Sentences from Tectogrammatical Trees
Jan Ptacek, Zdenek Zabokrtsky (Charles University, Prague, Czech Republic)
In this paper we deal with a new rule-based approach to the Natural Language Generation problem. The presented system synthesizes Czech sentences from Czech tectogrammatical trees supplied by the Prague Dependency Treebank 2.0 (PDT 2.0). Linguistically relevant phenomena including valency, diathesis, condensation, agreement, word order, punctuation and vocalization have been studied and implemented in Perl using software tools shipped with PDT 2.0. BLEU score metric is used for the evaluation of the generated sentences.
#108: A System for Information Retrieval from Large Records of Czech Spoken Data
Jan Nouza, Jindrich Zdánský, Petr Cerva, Jan Kolorenc (TU Liberec, Czech Republic)
In the paper we describe a complex multi-level system that serves for automatic search in large records of Czech spoken data. It includes modules for audio signal segmentation, speaker identification and adaptation, speech recognition and full-text search. The search can focus both on key-words and key-speakers. The transcription accuracy is about 79 % (for broadcast programs), search accuracy about 90 %. Due to its distributed platform, the system can operate in almost real-time.
#109: Effective Architecture of the Polish Tagger
Maciej Piasecki, Grzegorz Godlewski (Wroclaw University of Technology, Poland)
The large tagset of the IPI PAN Corpus of Polish and the limited size of the learning corpus make construction of a tagger especially demanding. The goal of this work is to decompose the overall process of tagging of Polish into subproblems of partial disambiguation. Moreover, an architecture of a tagger facilitating this decomposition is proposed. The proposed architecture enables easy integration of hand-written tagging rules with the rest of the tagger. The architecture is open for different types of classifiers. A complete tagger for Polish called TaKIPI is also presented. Its configuration, the achieved results (92.55% of accuracy for all tokens, 84.75% for ambiguous tokens in ten-fold test), and considered variants of the architecture are discussed, too.
#110: Handmade and Automatic Rules for Polish Tagger
Maciej Piasecki (Wroclaw University of Technology, Poland)
Stochastic approaches to tagging of Polish brought results far from being satisfactory. However, successful combination of hand-written rules and a stochastic approach to Czech, as well, as some initial experiments in acquisition of tagging rules for Polish revealed potential capabilities of a rule based approach. The goals are: to define a language of tagging constraints, to construct a set of reduction rules for Polish and to apply Machine Learning to extraction of tagging rules. A language of functional tagging constraints called JOSKIPI is proposed. An extension to the C4.5 algorithm based on introducing complex JOSKIPI operators into decision trees is presented. Construction of a preliminary hand-written tagging rules for Polish is discussed. Finally, the results of the comparison of different versions of the tagger are given.
#111: Visualization of Prosodic Knowledge Using Corpus Driven MEMOInt Intonation Modelling
David Escudero Mancebo, Valentin Cardeñoso-Payo (University of Valladolid, Spain)
In this work we show how our intonation corpus driven intonation modelling methodology MEMOInt can help in the graphical visualization of the complex relationships between the different prosodic features which configure the intonational aspects of natural speech. MEMOInt has already been used successfully for the prediction of synthetic F0 contours in the presence of the usual data scarcity problems. Now, we report on the possibilities of using the information gathered in the modelling phase in order to provide a graphical view of the relevance of the various prosodic features which affect the typical F0 movements. The set of classes which group the intonation patterns found in the corpus can be structured in a tree in which the relation between the classes and the prosodic features of the input text is hierarchically correlated. This visual outcome shows to be very useful to carry out comparative linguistic studies of prosodic phenomena and to check the correspondence between previous prosodic knowledge on a language and the real utterances found in a given corpus.
#112: Segmentation of Complex Sentences
Vladislav Kubon, Marketa Lopatkova, Martin Platek (Charles University, Prague, Czech Republic), Patrice Pognan (CERTAL INALCO, Paris, France)
The paper describes a method of dividing complex sentences into segments, easily detectable and linguistically motivated units that may be subsequently combined into clauses and thus provide a structure of a complex sentence with regard to the mutual relationship of individual clauses. The method has been developed for Czech as a language representing languages with relatively high degree of word-order freedom. The paper introduces important terms, describes a segmentation chart, the data structure used for the description of mutual relationship between individual segments and separators. It also contains a simple set of rules applied for the segmentation of a small set of Czech sentences. The segmentation results are evaluated against a small hand-annotated corpus of Czech complex sentences.
#114: Featuring of Sex-Dependent Nouns in Databases Oriented to European Languages
Igor A. Bolshakov (National Polytechnic Institute, Mexico City, Mexico), Sofia N. Galicia-Haro (National Autonomous University of Mexico (UNAM), Mexico City, Mexico)
It is argued that human-denoting nouns in European languages forming pairs like English steward vs. stewardess, or Spanish jefe vs. jefa `chief', or German Student vs. Studentin `student', or Russian moskvic vs. moskvicka `Muscovite' may be featured in factographic databases conjointly as Sex-Dependent Nouns?a special part of speech. Each SDN has two forms, maybe coinciding, selected when necessary by the sex of the denoted person. SDN notion ensures a kind of universality for translation between various languages, being especially convenient in languages with gender of nouns implied by sex. We base our reasoning on Spanish, French, Russian, German, and English examples.
#115: Simple Method of Determining the Voice Similarity
Konrad Lukaszewicz, Matti Karjalainen (Helsinki University of Technology, FInland)
This paper presents a simple method of determining the voice similarity by analyzing a set of very short sounds. A large number of pitch-length sounds were extracted from natural voice signals from different realizations of open vowels 'a' and 'o'. The voice similarity was defined as the sum of single elementary similarities of short sound pairs. This method is oriented to the microphonemic speech synthesis based on waveform concatenation, and it could help to limit the time needed for database collection. This simple and low computational load speech synthesis method can be applied in small portable devices and used for the rehabilitation of speech disabled people. %
#119: The Lexico-Semantic Annotation of PDT
Eduard Bejcek, Petra Moellerova, Pavel Stranak (Charles University, Prague, Czech Republic)
This paper presents our experience with the lexico-semantic annotation of the Prague Dependency Treebank (PDT). We have used the Czech WordNet (CWN) as an annotation lexicon (repository of lexical meanings) and we annotate each word which is included in the CWN. Based on the error analysis we have performed some experiments with modification of the annotation lexicon (CWN) and consequent re-annotation of occurrences of selected lemmas. We present the results of the annotations and improvements achieved by our corrections.
#124: Automated Mark Up of Affective Information in English Texts
Virginia Francisco, Pablo Gervás (Universidad Complutense de Madrid, Spain)
This paper presents an approach to automated marking up of texts with emotional labels. The approach considers in parallel two possible representations of emotions: as emotional categories and emotional dimensions. For each representation, a corpus of example texts previously annotated by human evaluators is mined for an initial assignment of emotional features to words. This results in a List of Emotional Words (LEW) which becomes a useful resource for later automated mark up. The proposed algorithm for automated mark up of text mirrors closely the steps taken during feature extraction, employing for the actual assignment of emotional features a combination of the LEW resource, the ANEW word list, and WordNet for knowledge-based expansion of words not occurring in either. The algorithm for automated mark up is tested and the results are discussed with respect to three main issues: relative adequacy of each one of the representations used, correctness and coverage of the proposed algorithm, and additional techniques and solutions that may be employed to improve the results.
#126: The Effect of Semantic Knowledge Expansion to Textual Entailment Recognition
Zornitsa Kozareva, Sonia Vázquez and Andrés Montoyo (Alicante University, Spain)
This paper studies the effect of semantic knowledge expansion applied to the Textual Entailment Recognition task. In comparison to the already existing approaches we introduce a new set of similarity measures that captures hidden semantic relations among different syntactic categories in a sentence. The focus of our study is also centred on the synonym, antonym and verb entailment expansion of the initially generated pairs of words. The main objective for the realized expansion concerns the finding, the affirmation and the enlargement of the knowledge information. In addition, we applied Latent Semantic Analysis and the cosine measure to tune and improve the obtained relations. We conducted an exhaustive experimental study to evaluate the impact of the proposed new similarity relations for Textual Entailment Recognition.
#130: Post-Processing of Automatic Segmentation of Speech using Dynamic Programming
Marcin Szymanski, Stefan Grocholewski (Poznan University of Technology, Poland)
Building unit-selection speech synthesisers requires a precise annotation of large speech corpora. Manual segmentation of speech is a very laborious task, hence there is the need for automatic segmentation algorithms. As it was observed that the common HMM-based method is prone to systematical errors, some boundary refinement approaches, like boundary-specific correction, were introduced. Last year, a dynamic programming fine-tuning approach was proposed, that combined two sources information, boundary error distribution and boundary MFCC statistical models. In this paper we verify the usefulness of incorporating several other data, boundary energy dynamics models and the signal periodicity information.
#132: Two-Dimensional Visual Language Grammar
Siska Fitrianie, Leon J.M. Rothkrantz (Delft University of Technology, The Netherlands)
Visual language refers to the idea that communication occurs through visual symbols, as opposed to verbal symbols or words. Contrast to a sentence construction in spoken language with a linear ordering of words, a visual language has a simultaneous structure with a parallel temporal and spatial configuration. Inspired by Deikto ite{cra05}, we propose a two-dimensional string or sentence construction of visual expressions, i.e. spatial arrangements of symbols, which represent concepts. A proof of concept communication interface has been developed, which enables users to create visual messages to represent concepts or ideas in their mind. By the employment of ontology, the interface constructs both the syntax and semantics of a 2D visual string using a Lexicalized Tree Adjoining Grammar (LTAG) into (natural language) text. This approach captures elegantly the interaction between pragmatic and syntactic descriptions in a 2D sentence, and the inferential interactions between multiple possible meanings generated by the sentence. From our user test results, we conclude that our developed visual language interface could serve as a communication mediator.
#134: Semantic Representation of Events: Building a Semantic Primes Component
Milena Slavcheva (Bulgarian Academy of Sciences, Sofia, Bulgaria)
This paper describes a system of semantic primes necessary for the large-scale semantic representation of event types, encoded as verbal predicates. The system of semantic primes is compiled via mapping modeling elements of the Natural Semantic Metalanguage (NSM), the Semantic Minimum - Dictionary of Bulgarian (SMD), and the Role and Reference Grammar (RRG). The so developed system of semantic primes is a user-defined extension to the metalanguage, adopted in the Unified Eventity Representation (UER), a graphical formalism, introducing the object-oriented design to linguistic semantics.
#135: Transformation-Based Tectogrammatical Analysis of Czech
Václav Klimes (Charles University, Prague, Czech Republic)
There are several tools that support manual annotation of data at the Tectogrammatical Layer as it is defined in the Prague Dependency Treebank. Using transformation-based learning, we have developed a tool which outperforms the combination of existing tools for pre-annotation of the tectogrammatical structure by 29% (measured as a relative error reduction) and for the deep functor (i.e., the semantic function) by 47%. Moreover, using machine-learning technique makes our tool almost independent of the language being processed. This paper gives details of the algorithm and the tool.
#136: Segmental Duration Modelling in Turkish
Ozlem Ozturk (Dokuz Eylul University, Izmir, Turkey), Tolga Ciloglu (Middle East Technical University, Ankara, Turkey)
Naturalness of synthetic speech highly depends on appropriate modelling of prosodic aspects. Mostly, three prosody components are modelled: segmental duration, pitch contour and intensity. In this study, we present our work on modelling segmental duration in Turkish using machine-learning algorithms, especially Classification and Regression Trees . The models predict phone durations based on attributes such as current, preceding and following phones' identities, stress, part-of-speech, word length in number of syllables, and position of word in utterance extracted from a speech corpus. Obtained models predict segment durations better than mean duration approximations ($\sim$0.77 Correlation Coefficient , and 20.4 ms Root-Mean Squared Error ). In order to improve prediction performance further, attributes used to develop segmental duration are optimized by means of Sequential Forward Selection method. As a result of Sequential Forward Selection method, phone identity, neighboring phone identities, lexical stress, syllable type, part-of-speech, phrase break information, and location of word in the phrase constitute optimum attribute set for phoneme duration modelling.
#138: Feature Subset Selection Based on Evolutionary Algorithms...
Aitor Álvarez, Idoia Cearreta, Juan Miguel López, Andoni Arruti, Elena Lazkano, Basilio Sierra, Nestor Garay (University of the Basque Country, Spain)
The study of emotions in human-computer interaction is a growing research area. Focusing on automatic emotion recognition, work is being performed in order to achieve good results particularly in speech and facial gesture recognition. In this paper we present a study performed to analyze different Machine Learning techniques validity in automatic speech emotion recognition area. Using a bilingual affective database, different speech parameters have been calculated for each audio recording. Then, several Machine Learning techniques have been applied to evaluate their usefulness in speech emotion recognition. In this particular case, techniques based on evolutive algorithms (EDA) have been used to select speech feature subsets that optimize automatic emotion recognition success rate. Achieved experimental results show a representative increase in the abovementioned success rate.
#139: Word Sequences for Text Summarization
Esau Villatoro-Tello, Luis Villaseñor-Pineda, Manuel Montes-y-Gomez (National Institute of Astrophysics, Optics and Electronics, Mexico)
Traditional approaches for extractive summarization score/classify sentences based on features such as position in the text, word frequency and cue phrases. These features tend to produce satisfactory summaries, but have the inconvenience of being domain dependent. In this paper, we propose to tackle this problem representing the sentences by word sequences ($n$-grams), a widely used representation in text categorization. The experiments demonstrated that this simple representation not only diminishes the domain and language dependency but also enhances the summarization performance.
#144: Silence/Speech Detection Method Based on Set of Decision Graphs
Jan Trmal, Jan Zelinka, Jan Vanek, Ludek Muller (University of West Bohemia in Pilsen, Czech Republic)
In the paper we demonstrate a complex supervised learning method based on a binary decision graphs. This method is employed in construction of a silence/speech detector. Performance of the resulting silence/speech detector is compared with performance of common silence/speech detectors used in telecommunications and with a detector based on HMM and a bigram silence/speech language model. Each non-leaf node of a decision graph has assigned a question and a sub-classifier answering this question. We test three kinds of these sub-classifiers: linear classifier, classifier based on separating quadratic hyper-plane (SQHP), and Support Vector Machines (SVM) based classifier. Moreover, besides usage of a single decision graph we investigate application of a set of binary decision graphs.
#149: Task Switching in Audio Based Systems
Melanie Hartmann, Dirk Schnelle (Darmstadt University of Technology, Germany)
The worker on the move has an ever-increasing need to access information, such as instructions on how to process with a task. The use of audio to convey that information and for interaction has many advantages over traditional hands&eyes devices, especially if the user needs his hands to perform a task. In this paper, we focus on a task model stored in a workflow engine. The execution of a task is often interrupted by external events or by the user who wants to suspend a task or switch to another one. If the user wants to resume the task he has to be aware of his current position in the workflow. Due to the transient nature of speech, he does not have the possibility to review what he has done before in audio-only systems. In this paper, we present a novel approach, based on psychological theories, to assist the user to get back into the context of an interrupted task. The usability of this recovery concept was successfully tested in a user study.
#154: Multilingual News Document Clustering:...
Soto Montalvo (URJC, Spain), Raquel Martínez (UNED, Spain), Arantza Casillas (UPV-EHU, Spain), Víctor Fresno (URJC, Spain)
This paper presents an approach for Multilingual News Document Clustering in comparable corpora. We have implemented two algorithms of heuristic nature that follow the approach. They use as unique evidence for clustering the identification of cognate named entities between both sides of the comparable corpora. In addition, no information about the right number of clusters has to be provided to the algorithms. The applicability of the approach only depends on the possibility of identifying cognate named entities between the languages involved in the corpus. The main difference between the two algorithms consists of whether a monolingual clustering phase is applied at first or not. We have tested both algorithms with a comparable corpus of news written in English and Spanish. The performance of both algorithms is slightly different; the one that does not apply the monolingual phase reaches better results. In any case, the obtained results with both algorithms are encouraging and show that the use of cognate named entities can be enough knowledge for deal with multilingual clustering of news documents.
#157: Phonetic Question Generation Using Misrecognition
Supphanat Kanokphara, Julie Carson-Berndsen (University College Dublin, Ireland)
Most automatic speech recognition systems are currently based on tied state triphones. These tied states are usually determined by a decision tree. Decision trees can automatically cluster triphone states into many classes according to data available allowing each class to be trained efficiently. In order to achieve higher accuracy, this clustering is constrained by manually generated phonetic questions. Moreover, the tree generated from these phonetic questions can be used to synthesize unseen triphones. The quality of decision trees therefore depends on the quality of the phonetic questions. Unfortunately, manual creation of phonetic questions requires a lot of time and resources. To overcome this problem, this paper is concerned with an alternative method for generating these phonetic questions automatically from misrecognition items. These questions are tested using the standard TIMIT phone recognition task.
#160: Another Look at the Data Sparsity Problem
Ben Allison, David Guthrie, Louise Guthrie (University of Sheffield, UK)
Performance on a statistical language processing task relies upon accurate information being found in a corpus. However, it is known (and this paper will confirm) that many perfectly valid word sequences do not appear in training corpora. The percentage of $n$-grams in a test document which are seen in a training corpus is defined as $n$-gram coverage, and work in the speech processing community ite{ros} has shown that there is a correlation between n-gram coverage and word error rate (WER) on a speech recognition task. Other work (e.g. ite{ban}) has shown that increasing training data consistently improves performance of a language processing task. This paper extends that work by examining n-gram coverage for far larger corpora, considering a range of document types which vary in their similarity to the training corpora, and experimenting with a broader range of pruning techniques. The paper shows that large portions of language will not be represented within even very large corpora. It confirms that more data is always better, but how much better is dependent upon a range of factors: the source of that additional data, the source of the test documents, and how the language model is pruned to account for sampling errors and make computation reasonable.
#161: Corpus-Based Unit Selection TTS for Hungarian
Márk Fék, Péter Pesti, Géza Németh, Csaba Zainkó, Gábor Olaszy (Budapest University of Technology and Economics, Hungary)
This paper gives an overview of the design and development of an experimental restricted domain corpus-based unit selection text-to-speech (TTS) system for Hungarian. The experimental system generates weather forecasts in Hungarian. $5260$ sentences were recorded creating a speech corpus containing $11$ hours of continuous speech. A Hungarian speech recognizer was applied to label speech sound boundaries. Word boundaries were also marked automatically. The unit selection follows a top-down hierarchical scheme using words and speech sounds as units. A simple prosody model is used, based on the relative position of words within a prosodic phrase. The quality of the system was compared to two earlier Hungarian TTS systems. A subjective listening test was performed by $221$ listeners. The experimental system scored $3.92$ on a five-point mean opinion score (MOS) scale. The earlier unit concatenation TTS system scored $2.63$, the formant synthesizer scored $1.24$, and natural speech scored $4.86$.
#164: ASeMatch: A Semantic Matching Method
Sandra Roger (University of Alicante, Spain & University of Comahue, Argentina), Agustina Buccella (University of Comahue, Argentina), Alejandra Cechich (University of Comahue, Argentina), Manuel Sanz Palomar (University of Alicante, Spain)
Usually, syntactic information of different sources does not provide enough knowledge to discover possible matchings among them. Otherwise, more suitable matchings can be found by using the semantics of these sources. In this way, semantic matching involves the task of finding similarities among overlapping sources by using semantic knowledge. In the last years, the ontologies have emerged to represent this semantics. On these lines, we introduce our ASeMatch method for semantic matching. By applying several NLP tools and resources in a novel way and by using the semantic and syntactic information extracted from the ontologies, our method finds complex mappings such as $1-N$ and $N-1$ matchings.
#165: Dynamic Bayesian Networks for Language Modeling
Pascal Wiggers, Leon J. M. Rothkrantz (Delft University of Technology, The Netherlands)
Although $n$-gram models are still the de facto standard in language modeling for speech recognition, it has been shown that more sophisticated models achieve better accuracy by taking additional information, such as syntactic rules, semantic relations or domain knowledge into account. Unfortunately, most of the effort in developing such models goes into the implementation of handcrafted inference routines. What lacks is a generic mechanism to introduce background knowledge into a language model. We propose the use of dynamic Bayesian networks for this purpose. Dynamic Bayesian networks can be seen as a generalization of the $n$-gram models and \hmms traditionally used in language modeling and speech recognition. Whereas those models use a single random variable to represent state, Bayesian networks can have any number of variables. As such they are particularly well-suited for the construction of models that take additional information into account. In this paper language modeling with belief networks is discussed. Examples of belief network implementations of well-known language models are given and a new model is presented that models dependencies between the content words in a sentence.
#168: Evaluating Language Models within a Predictive Framework: ...
Pierre Alain, Olivier Boeffard, Nelly Barbot (Université de Rennes 1, France)
Perplexity is a widely used criterion in order to compare language models without any task assumptions. However, the main drawback is that perplexity supposes probability distributions and hence cannot compare heterogeneous models. As an evaluation framework, we propose in this article to abandon perplexity and to extend the Shannon's entropy idea which is based on model prediction performance using rank based statistics. Our methodology is able to predict joint word sequences being independent of the task or model assumptions. Experiments are carried out on the English language with different kind of language models. We show that long-term prediction language models are not more effective than the standard $n$-gram models. Ranking distributions follow exponential laws as already observed in predicting letter sequences. These distributions show a second mode not observed with letters and we propose to give some interpretation to this mode in this article.
#170: Parsing with Oracle
Michal Zemlicka (Charles University, Prague, Czech Republic)
Combining two well-known techniques -- pushdown automata and oracles -- results in a new class of parsers (oracle pushdown automata) having many advantages. It makes possible to combine easily different parsing techniques handling different language aspects into a single parser. Such composition moreover preserves simplicity of design of the combined parts. It opens new ways of parsing for linguistic purposes.
#176: Comparing B-spline and Spline Models for F0 Modelling
Damien Lolive, Nelly Barbot, Olivier Boeffard (University of Rennes 1, France)
This article describes a new approach to estimate $F_0$ curves using B-spline and Spline models characterized by a knot sequence and associated control points. The free parameters of the model are the number of knots and their location. The free-knot placement, which is a NP-hard problem, is done using a global MLE (Maximum Likelihood Estimation) within a simulated-annealing strategy. %Two criteria are proposed Experiments are conducted in a speech processing context on a 7000 syllables french corpus. We estimate the two challenging models for increasing values of the number of free parameters. We show that a B-spline model provides a slightly better improvement than the Spline model in terms of RMS error.
#177: Paragraph-Level Alignment of an English-Spanish Parallel Corpus of Fiction Texts
Alexander Gelbukh, Grigori Sidorov, Jose Angel Vera-Felix (National Polytechnic Institute, Mexico City, Mexico)
Aligned parallel corpora are very important linguistic resources useful in many text processing tasks such as machine translation, word sense disambiguation, dictionary compilation, etc. Nevertheless, there are few available linguistic resources of this type, especially for fiction texts, due to the difficulties in collecting the texts and high cost of manual alignment. In this paper, we describe an automatically aligned English-Spanish parallel corpus of fiction texts and evaluate our method of alignment that uses linguistic data-namely, on the usage of existing bilingual dictionaries-to calculate word similarity. The method is based on the simple idea: if a meaningful word is present in the source text then one of its dictionary translations should be present in the target text. Experimental results of alignment at paragraph level are described.
#178: Using Prosody for Automatic Sentence Segmentation of Multi-Party Meetings
Jachym Kolar (ICSI, Berkeley, USA and University of West Bohemia in Pilsen, Czech Republic), Elizabeth Shriberg (ICSI, Berkeley, USA and SRI International, Menlo Park, USA), Yang Liu (ICSI, Berkeley, USA and University of Texas at Dallas, USA)
We explore the use of prosodic features beyond pauses, including duration, pitch, and energy features, for automatic sentence segmentation of ICSI meeting data. We examine two different approaches to boundary classification: score-level combination of independent language and prosodic models using HMMs, and feature-level combination of models using a boosting-based method (BoosTexter). We report classification results for reference word transcripts as well as for transcripts from a state-of-the-art automatic speech recognizer (ASR). We also compare results using the lexical model plus a pause-only prosody model, versus results using additional prosodic features. Results show that (1) information from pauses is important, including pause duration both at the boundary and at the previous and following word boundaries; (2) adding duration, pitch, and energy features yields significant improvement over pause alone; (3) the integrated boosting-based model performs better than the HMM for ASR conditions; (4) training the boosting-based model on recognized words yields further improvement.
#180: Detecting Broad Phonemic Class Boundaries from Greek Speech in Noise Environments
Iosif Mporas, Panagiotis Zervas, Nikos Fakotakis (University of Patras, Greece)
In this work, we present the performance evaluation of an implicit approach for the automatic segmentation of continuous speech signals into broad phonemic classes as encountered in Greek language. Our framework was evaluated with clear speech and speech with white, pink, bubble, car and machine gun additive noise. Our framework's results were very promising since an accuracy of 76.1{%} was achieved for the case of clear speech (for distances less than 25 msec to the actual segmentation point), without presenting over-segmentation on the speech signal. An average reduction of 4{%} in the total accuracy of our segmentation framework was observed in the case of wideband distortion additive noise environment.
#184: Applying RST Relations to Semantic Search
Nguyen Thanh Tri, Akira Shimazu, Le Cuong Anh, Nguyen Minh Le (JAIST, Ishikawa, Japan)
This paper proposes a new way of extracting answers to some kinds of queries based on Rhetorical Structure Theory (RST). For each type of question, we assign one or more rhetorical relations that help extract the corresponding answers. We use ternary expressions which are successfully applied in the well-known question answering system START to represent text segments, index documents and queries. The cosine measure is used in the matching process. The experiment with RST Discourse Treebank shows that the results of ternary-expression-based indexing are better than those of keyword-based indexing.
#192: Indexing and Search Methods for Spoken Documents
Lukás Burget, Jan Cernocký, Michal Fapso, Martin Karafiát, Pavel Matejka, Petr Schwarz, Pavel Smrz, Igor Szoeke (Brno University of Technology, Czech Republic)
This paper presents two approaches to spoken document retrieval---search in LVCSR recognition lattices and in phoneme lattices. For the former one, an efficient method of indexing and search of multi-word queries is discussed. In phonetic search, the indexation of tri-phoneme sequences is investigated. The results in terms of response time to single and multi-word queries are evaluated on ICSI meeting database.
#193: Recognizing Connected Digit Strings Using Neural Networks
L{}ukasz Brocki, Danijel Korzinek, Krzysztof Marasek (Polish-Japanese Institute of Information Technology, Warsaw, Poland)
This paper discusses the usage of feed-forward and recurrent Artificial Neural Networks (ANNs) in whole word speech recognition. The Long-Short Term Memory (LSTM) network has been trained to do speaker independent recognition of any series of connected digits in polish language, using only the acoustic features extracted from speech. It is also shown how to effectively change the analog network output into binary information on recognized words. The parametrs of the conversion are fine-tuned using artificial evolution.
#197: Prosodic Cues for Automatic Phrase Boundary Detection in ASR
Klara Vicsi, Gyoergy Szaszak (Budapest University for Technology and Economics, Hungary)
This article presents a cross-lingual study for Hungarian and Finnish about the segmentation of continuous speech on word and phrasal level based on prosodic features. A word level segmenter has been developed which can indicate the word boundaries with acceptable accuracy for both languages. The ultimate aim is to increase the robustness of Automatic Speech Recognizers (ASR) by detection of word and phrase boundaries, and thus significantly decrease the searching space during the decoding process, very time-consuming in case of agglutinative languages, like Hungarian and Finnish. They are however fixed stressed languages, so by stress detection, word beginnings can be marked with reliable accuracy. An algorithm based on data-driven (HMM) approach was developed and evaluated. The best results were obtained by time series of fundamental frequency and energy together. Syllable length was found to be much less effective, hence was discarded. By use of supra-segmental features, word boundaries can be marked with high correctness ratio, if we allow not to find all of them. The method we evaluated is easily adaptable to other fixed-stress languages. To investigate this we adapted the method to the Finnish language and obtained similar results.
#199: Words for emotions
M. Carmen Fernández Leal
A general linguistic approach for the analysis of emotions is concerned with three parmeters: an even representation, the linguistic symbol for the object that causes the emotion and the linguistic symbol for the emotion provided. The absence or presence of the parameters is the source of different structures of sentences where emotions are expressed. A data is supplied with words indicating emotions or sentences where they appear. Phonological information refers to the number of syllables, the syllable on which stress is placed and the constituents of the intonation structure of the sentence. Syntactic information is based on the kind of grammatical category and function, as well as the modal aspect of the sentence where the word appears. The semantic analysis is focused on the semantic components and the semantic field connected with the words that express emotions. The pragmatic analysis deals with the type of speech acts related to the speaker's attitude with regard a linguistic context and a context of utterance. The analysis gives a survey of the characteristics of the linguistic expression of emotions on the different language levels; some of the items are language specific, in this case proper of the English language, and some others have a universal connotation.
#200: LVCSR System for Automatic Online Captioning
Pražák Aleš, Psutka J. V., Hoidekr Jan
At the Department of Cybernetics, University of West Bohemia in Pilsen, we have developed the system for automatic online generation of captions (subtitles), i.e. the large vocabulary continuous speech recognition system operating in real-time. There is almost no chance to cover the whole vocabulary of common speech. To achieve a minimum recognition error rate and fluent captions reading, only tasks with a vocabulary limited by a specific domain, such as TV ice hockey commentary or parliament speech, can be implemented. This system will be used in a pilot project with the Czech television for automatic captioning of TV live broadcasting. The demonstration will present a system for real-time captioning of two different tasks - TV ice hockey commentaries and parliament meetings (with word error rate about 15 %) in Czech language. The system consists of three modules - the TV tuner for source signal acquisition (replaced by video cassette player for the demonstration purposes), LVCSR (large vocabulary continuous speech recognition) system for generation of captions in real-time, and module for post-processing of captions (rendering of the captions to the source signal for the demonstration purposes). This system can be used for any ice hockey match commentary (commented by Robert Záruba) with given team line-ups and any Czech parliament meeting, even for TV live broadcasting. The system for information retrieval from the recognition output (word lattice) will be presented too.
#201: A virtual butler controlled by speech
A. Uria, M. I. Torres, V. Guijarrubia, J. Garmendia, O. Aizpuru and E. Alonso
The aim of this work was to develop a virtual butler-service to be installed at home to control electrical appliances and to provide information about their state. The framework of this project lays on FAGOR Home Appliance, which is the electrical appliance multinational in Spain. The 44\% of FAGOR sales are on the international market, and 70\% of these overseas sales are made in countries as competitive as France, Germany and Great Britain. The overall goal is to get an intelligent home where anyone, even physically handicapped people, could control the appliance with the voice. The system works as a virtual butler. A spoken dialog system was developed allowing spontaneous and speaker independent speech input. A speech-understanding model translates the recognized utterance into a sequence of task dependent frames. The dialog manager generates the suitable answer. It also controls the device activation when the information needed is ready. We plan a demo allowing any Spanish speaker have a dialogue with the virtual butler. The butler will be able to fully control and program a washing machine, a dishwasher machine and an oven. He also will be able to show some recipes. In particular, several ways to cook the chicken can be managed.
#202: Archivus: A multimodal system for multimedia meeting browsing and retrieval
Marita Ailomaa, Pavel Cenek, Agnes Lisowska, Miroslav Melichar and Martin Rajman
This demonstration will present Archivus, a multimodal language-enabled meeting browsing and retrieval system whose purpose is to allow users to access multimedia meeting data in a way that is most natural to them. Since this is a new domain of interaction, users can be encouraged to try out and consistently use novel input modalities such as voice, including more complex natural language. Such multimodal interaction can help the user find information more efficiently, in particular in this complex domain. In the demonstration we will show a methodology and the accompanying software, including a sophisticated Wizard of Oz environment, that we have developed to help developers of complex language-enabled multimodal systems develop the necessary natural language processing and dialogue management modules.
#203: Software system LFLC2000 for reasoning under vagueness
Antonín Dvořák
In our demonstration we present the software system LFLC2000 (Linguistic Fuzzy Logic Controller. It is a complex tool which allows to design and use special linguistic descriptions of processes, situations etc. They are then processed by means of unique methodology based on theoretical research of fuzzy logic and various aspects of vagueness phenomenon. It is also possible to mine these descriptions from real-world data. Linguistic descriptions include so-called evaluating linguistic expressions (e.g. small, very big, approximately 30 etc.) Their meaning is modeled using special theory in higher-order fuzzy logic. Linguistic descriptions are further processed using perception-based logical deduction, which allows users to work mainly on linguistic level and to obtain intuitively satisfactory results without knowing the details of implementation. These linguistic descriptions then can be used in various application fields, e.g. in control, decision making or data mining and linguistic data summarization. We present basic features of LFLC 2000 and also show the possibilities of mining of so-called linguistic associations from data.
#204: LARS: LAughter Recognition in Speech
Khiet Truong, Willem Melder, David van Leeuwen
Automatic detection of the user's affective state/emotion can improve human-machine interaction. For instance, if a dialogue system can sense that the user is getting frustrated or angry with the system, it can adjust its dialogues or redirect the call to a human. Our goal is to automatically detect the user's affective/emotional state in speech. Problems with the vagueness and subjectivity of "emotion" and the lack of natural emotional speech data have made us decide to first focus on a particular expression of emotion, namely laughter. Laughter is relatively easily recognizable by humans (in contrast to other classes of emotions) and occurs relatively frequently in speech. In order to develop our laughter recognizer in speech (LARS), we use machine learning techniques, such as Gaussian Mixture Modelling (GMM) to train our laughter and non-laughter models. As speech features that are modeled by the GMMs, we use Perceptual Linear Predictive coefficients (PLP), log-energy and their derivatives. A Viterbi decoder was used to find the beginning and end of laughter. So far, we have taken laughter sounds from meetings to train the models. The results are promising: we achieve false alarm and miss rates, both time-weighted, of approximately 11%. In this demo version of LARS, it will be possible to upload your own audio files to let LARS find the laughter events. LARS can be employed for many different purposes; some examples of where LARS can be useful for are hotspot detection in meetings, meeting summarization/browsing, humour modelling/detection, dialogue act classification and more generally, improving human-machine interaction. In the near future, we will implement LARS as a fun and interactive component in another demo that we are developing at the moment, called the "Affective Mirror". The idea is that we use the concept of distorting mirrors on fun fairs to develop a virutal multi-modal interactive "Affective Mirror". By creating visual effects, the mirror interface can adapt to the user's laughter and the user's affective state which can lead to a fun, interactive user experience.
#205: New Features of the Professional Lexicography Application TshwaneLex 2.0
David Joffe & Gilles-Maurice de Schryver
On the 3rd of July 2006, TshwaneLex 2.0 was released. TshwaneLex is the world's only truly off-the-shelf professional lexicography software suite for the compilation of monolingual, bilingual and multilingual dictionaries, and for the publication of dictionaries in hardcopy, online and electronic formats. The new version contains over fifty new features, and during the presentation the creators of the application intend to demonstrate some of the highlights to interested participants.
#206: Slovene Spoken Corpus
Jana Zemljarič Miklavčič, Marko Stabej
Spoken language corpora became regular and commonly accepted resources for language studies and language description. In the period of September–December 2004, a pilot corpus of spoken Slovene has been compiled at the Department of Culture, Language and Information Technology at the University of Bergen. The actual purpose of corpus compilation was to establish a theoretical and empirical foundation for building a large spoken corpus of Slovene, which is planned to complement the 300 million word FIDA corpus as its spoken component. Corpus is based on digital recordings of spontaneous speech, collected in 2004 according to different contextual criteria. The recordings have been transcribed with two transcribing tools, Transcriber and Praat. Both transcription tools assign a time stamp to the start and end of each transcribed segment. Programs transform these transcripts to other formats and make interpolation of the time for words within the segment. One format is a HTML version of the text for browsing and easy access to segments of the sound by clicking a link for each turn. Another format is the Corpus WorkBench format (IMS, Stuttgart) where information about speakers and the setting of the recordings have been added as additional columns. During actual transcription work, the transcription and annotation standards, based on TEI and EAGLES recommendations, have been outlined. The whole texts are accessible, as well as concordances and collocations of single words. Different criteria could be used for searching the corpus. In each case, transcriptions are linked to sound files. Pilot spoken corpus is available for language research community at http://torvald.aksis.uib.no/talem/jana/s9.html.
Panel Discussion: Emotions in Text, Speech and Dialogue
Elmar Noeth, Nadia Mana, Geza Nemeth, Eva Hudlicka
In this panel we deal with human-computer interaction applications that utilize emotional awareness. We will touch monomodal and multimodal systems and applications, i.e. systems that process emotional input and/or express their output potentially in an emotional way in all modalities (text, speech, body and facial gestures, and haptic input). After a containment of the term "emotion" we will discuss potential applications and talk about which are important, which are conceivable in the future and what are barriers.

 Conference Photos
Conference Photos