1.2http://explain.z3950.org/dtd/2.0/xml127.0.0.18080BASSRUBAS FCS EndpointBAS FCS EndpointSuche in den Corpora des Bayerischen Archivs für Sprachsignale.Search in the corpora of the Bavarian Archive for Speech Signals.Bayerisches Archiv für SprachsignaleBavarian Archive for Speech SignalsBAS Repository CLARIN Content SearchBAS Repository CLARIN Content SearchWordswords CLARIN Content Search25010001.2http://clarin.eu/fcs/capability/basic-searchhttp://clarin.eu/fcs/capability/advanced-searchapplication/x-clarin-fcs-hits+xmlapplication/x-clarin-fcs-adv+xmlORTKANSmartWeb Motorbike Corpus SMCThe SMARTWEB UMTS data collection, of which the SMC corpus is a part, was created within the publicly funded German SmartWeb project in the years 2004 - 2006. It comprises a collection of user queries to a naturally spoken Web interface with the main focus on the soccer world series in 2006. The SMC corpus itself contains 36 mobile recordings performed on a BMW motorbike. Starting from version 2.6 (CLARIN Repository Version 2), SMC is also distributed as an emuR compatible emuDB.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SMC/SMC.2.phpdeuBAS Alcohol Language CorpusThis corpus contains recordings of 162 speakers while being sober and intoxicated. Beginning with version 3, this corpus edition also contains an emuR compatible database version of the corpus (with a minor bugfix in the database in version 3.1).https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/ALC/ALC.4.phpdeuBAS Thesis data Veronika Neumeyer: CI ArticulationThis corpus contains speech recordings of normal hearing speakers and speakers equipped with Cochlear Implants (CI), as used for analysis in the Master thesis of Veronika Neumeyer (2009, LMU MÃ¼nchen). Speech data were collected with the software SpeechRecorder, for each recording a BPF file was generated (*.par), on which the MAUS segmentation was based (*.TextGrid). Starting with version 1.2, this corpus is distributed as an emuR compatible EMU database (files ending in *_annot.json).https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/CI_1/CI_1.2.phpdeuPhonDat 2The corpus contains read speech of 16 different speakers, 6 women and 10 men. Each speaker reads a corpus of 200 different sentences from a train query task. They were recorded at three different sites in Germany (University of Kiel, University of Bonn, University of Munich). The language is German. The corpus contains a total of 3200 recorded utterances. Starting with version 3.0 (BAS CLARIN Repository version 4), the corpus is also distributed as an emuDB.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/PD2/PD2.4.phpdeuPh@ttSessionz Adolescents Speech CorpusThe Ph@ttSessionz speech database contains recordings of 1019 adolescent speakers of German (age range 12-20). The recordings were performed via the WWW in public schools (Gymnasium) in 45 locations in Germany. The speech material recorded is a superset of the German SpeechDat-II and RVG-I corpora. It is now also available for download in emuR compatible format (starting from version 2.1.0).https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/PHATTSESSIONZ/PHATTSESSIONZ.2.phpdeuBAS Verbmobil 1The Verbmobil (VM) dialog database is a collection of German, American and Japanese dialog recordings in the appointment scheduling task. The data were collected during the first phase (1993 - 1996) of the German VM project funded by the German Ministry of Science and Technology (BMBF). Starting with version 3, the corpus is also provided as an emuR comptatible database.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/VM1/VM1.3.phpdeujpnengBAS Siemens Hoergeraete Corpus Corpus of spontaneous, relatively casual dialogues in German. Each pair of dialogue partners is recorded conversing under real-noise conditions (in a noisy cafeteria and in a car going at different velocities), as well as in a studio at various levels of lombard noise played directly into the subjects ears. Starting from version 2.1 (BAS Clarin Repository version 2), this corpus is also distributed as an emuR compatible emuDB.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/HOESI/HOESI.2.phpdeuBAS Database for Signer-Independent Continuous Sign Language RecognitionThe SIGNUM Database contains both isolated and continuous utterances of various signers. Since we use a vision-based approach for sign language recognition the corpus was recorded on video. For quick random access to individual frames, each video clip is stored as a sequence of images. The vocabulary comprises 450 basic signs in German Sign Language (DGS) representing different word types. Based on this vocabulary, overall 780 sentences were constructed. Each sentence ranges from two to eleven signs in length. No intentional pauses are placed between signs within a sentence, but the sentences themselves are separated. The entire corpus, i.e. all 450 basic signs and all 780 sentences, was performed once by 25 native signers of different sexes and ages. One of them was chosen to be the so-called reference signer. His performances were recorded not once but even three times. The SIGNUM Database was created within the framework of a research project at the Institute of Man-Machine Interaction, located at the RWTH Aachen University in Germany. The SIGNUM (Signer-Independent Continuous Sign Language Recognition for Large Vocabulary Using Subunit Models) project was funded by the Deutsche Forschungsgemeinschaft (German Research Foundation) and aimed to develop a video-based automatic sign language recognition system.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SIGNUM/SIGNUM.1.phpgsgaGender The speech corpus aGender contains speech sample recordings over public telephone lines with read and (semi-)spontaneous speech. Native German speakers called a voice portal from their private phone, and read text + answered some open questions. The purpose of the corpus is the automatic detection of gender and/or age (7 mixed classes ranging from 7 - 80 years). The corpus contains the voices of 945 German speakers (approx. minimum of 100 speakers per class), each delivering 18 speech items in up to six different sessions. The time/date of the individual recordings sessions were not controlled, neither the total number of sessions per speaker. The audio signal was recorded over standard cell phones (GSM standard) and landline connections in 8000 Hz, 8 bit alaw format. Data were then expanded to 8000Hz, 16bit PCM (13 bits are valid!). The selection of speakers is approximately evenly distributed over the seven target classes, with class 1 also being balanced for gender. The read material consists of an altered version of the SpeechDat text material, containing short fixed and free text typical for automated call centers. A typical utterance is about 2 seconds in length, but there are also some utterances are between 3 and 6 seconds. In total, the corpus consists of 47 hours of speech. Two sets were defined on that data: A training set (81.5%) and a test set (175 speakers, 25 per class, 18.5%), each with disjunctive speaker sets. For the test set no class information is given in this corpus. Refer to Section Evaluation on how to receive an evaluation from Telekom Labs. Users of this speech corpus are required to report any scientific publications based on these data to Felix Burkhardt (Felix.Burkhardt@telekom.de).https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/aGender/aGender.1.phpdeuBAS SC1The corpus contains speech of 88 different speakers, reading the German story Der Nordwind und die Sonne. Subcorpus T contains the recordings of 16 native Germans (L1). The other 72 speakers which were born and educated in other countries (L2) are pooled in subcorpus C. Every speaker has a distinct accent. This corpus may be used for several tasks: - automatic accent detection. - test of robustness against different accents in automatic speech recognition. - scientific investigation of accents in German. Subcorpus T may be used as a reference or training corpus for technical evaluations. These signals are marked with a T in the speaker information file. These recordings and the respective annotations can be found in the phondat corpus as well. Starting from Version 1.3 (CLARIN Repository version 2), this corpus is distributed as an emuDB. This emuDB contains an automatic phonetic segmentation for all speakers (level MAU) as well as a manual phonetic segmentation for subcorpus T (level PHO). Version 3 is identical to version 2 with the exception of a bugfix in the emuDBs MAU level.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SC1/SC1.3.phpdeuLMU AsiCaThe AsiCa-Corpus basically is a documentation of the South Italian dialect Calabrese. The main objects when building this corpus were the analysis of syntactical structures and their geolinguistic mapping in form of interactive, webbased cartography. The corpus consists of several audio files containing recordings of some sixty speakers of Calabrese one half of which having migration experience in Germany the other half almost always having stayed in Calabria. Furthermore the informants were selected equally balanced regarding gender, age and geographical origin. Of most of the informants there exist at least one recording with spontanous speech and one recording based on stimuli each. The results of syntactical analysis (maps and text) can be seen on the projects website at http://www.asica.gwi.uni-muenchen.de.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/AsiCa/AsiCa.1.phpitaSmartKom AudioThis corpus contains the audio recordings of all actors who use the SmartKom system; it covers the audio recordings (no video) and annotations of all three original SmartKom corpora Public, Mobile and Home. Naive users were asked to test a prototype for a market study not knowing that the system was in fact controlled by two human operators. They were asked to solve two tasks in a period of 4,5 min while they were left alone with the system. The instruction was kept to a minimum; in fact the user only knew that the system is able to understand speech, gestures and even mimical expressions and should more or less communicate like a human.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SKAUDIO/SKAUDIO.1.phpdeuDissertation Data Dr. Veronika Neumeyer: Consonant Cluster Production in Cochlear Implant PatientsThe CI_2 corpora contain German speech recordings of 48 cochlear implant users (CI) and 48 speakers without hearing impairment (control group, KG). The data were analyzed in Veronika Neumeyers dissertation Akustische Analysen der Sprachproduktion von CI-TrÃ¤gern (2015). CI_2_Cluster contains recordings used for the analysis of the temporal dynamics of the consonant cluster /Êƒtr/. The data was recorded using SpeechRecorder and automatically segmented using MAUS, followed by a manual correction of the target phoneme(s). The database is distributed as an emuR compatible database (emuDB format). Version 2 : removed derived spectra files *.dft from corpus.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/CI_2_Cluster/CI_2_Cluster.2.phpdeuDissertation Data Dr. Veronika Neumeyer: Voice Onset Time in Cochlear Implant PatientsThe CI_2 corpora contain synchronous speech recordings of 48 cochlear implant users (CI) and 48 speakers without hearing impairment (control group, KG). The data were analyzed in Veronika Neumeyers dissertation Akustische Analysen der Sprachproduktion von CI-TrÃ¤gern (2015). CI_2_VOT contains recordings used for the analysis of voice onset time in /t/ in the word teilen. The data was recorded using SpeechRecorder and automatically segmented using MAUS, followed by a manual correction of the target phoneme(s). The database is distributed as an emuR compatible database (emuDB format). Version 3 : removed derived spectra files *.dft from speech corpus.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/CI_2_VOT/CI_2_VOT.3.phpdeuDissertation Data Dr. Veronika Neumeyer: Cluster Production in Cochlear Implant Patients (diachronic data)The CI_3 corpora contain diachronic speech recordings from three cochlear implant (CI) users which were analysed in the long term study part of Veronika Neumeyers PhD Thesis Akustische Analysen der Sprachproduktion von CI-TrÃ¤gern (2015). Please note that the corpus is distributed as four separate subcorpora (CI_3_Cluster, CI_3_Sibilants, CI_3_VOT, CI_3_Vowels). For data used in the corresponding synchronic study, please refer to the CI_2 corpora. CI_3_Cluster contains recordings used for the analysis of the temporal dynamics of the consonant cluster /Êƒtr/. The data was recorded using SpeechRecorder and automatically segmented using MAUS, followed by a manual correction of the target phoneme(s). The database is distributed as an emuR compatible database (emuDB format). Version 2: Fixed a bug in the emuR DB config.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/CI_3_Cluster/CI_3_Cluster.2.phpdeuBielefeld Speech and Gesture Alignment CorpusThe primary data of the SaGA corpus are made up of 25 dialogs of interlocutors (50), who engage in a spatial communication task combining direction-giving and sight description. Six of those dialogues with data only from the direction giver are available including audio (*.wav) and video (*.mp4) data. The secondary data consists of annotations (*.eaf) of gestures and speech-gesture referents, which have been completely and systematically annotated based on an annotation grid (cf. the SaGA documentation). The corpus is comprised of of 9881 isolated words and 1764 isolated gestures. The stimulus is a model of a town presented in a Virtual Reality (VR) environment. Upon finishing a â€œbus rideâ€ through the VR town along five landmarks, a router explained the route as well as the wayside landmarks to an unknown and naive follower. The SaGA Corpus was curated for CLARIN as part of the Curation Project Editing and Integration of Multimodal Resources in CLARIN-D by the CLARIN-D Working Group 6 Speech and Other Modalities.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SaGA/SaGA.1.phpdeuUntersuchung auditiver und akustischer Merkmale zur Evaluation der Stimmaehnlichkeit von Bruederpaaren unter forensischen AspektenBROTHERS contains recordings of pairs of brothers between the ages of 19 and 31. The native and recorded language is German. The recordings were analyzed in Hanna Feisers dissertation Untersuchung auditiver und akustischer Merkmale zur Evaluation der StimmÃ¤hnlichkeit von BrÃ¼derpaaren unter forensischen Aspekten to evaluate the pair-wise similarity of sibling voices and the degree to which they are confused by listeners. Recordings consist of minimal pairs in carrier sentences, a different set of sentences aimed at elicitating the full range of German vowels (Berliner SÃ¤tze), and a spontaneous dialogue about a TV-series. Recordings were made via a table microphone (studio quality) and via telephone (telephone quality). Transcriptions and an automatically derived phonetic segmentation are provided along with the formant and fundamental frequency SSFF tracks used in the original dissertation. The corpus is provided as a ready-for-use emuR compatible emu database. This version fixes an error in the structure of the emu database in version 1.0 (incorrect mappings from bundles to annotation files).https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/BROTHERS/BROTHERS.2.phpdeuDissertation Data Dr. Veronika Neumeyer: Sibilant Production in Cochlear Implant PatientsThe CI_2 corpora contain synchronous speech recordings of 48 cochlear implant users (CI) and 48 speakers without hearing impairment (control group, KG). The data were analyzed in Veronika Neumeyers dissertation Akustische Analysen der Sprachproduktion von CI-TrÃ¤gern (2015). CI_2_Sibilants contains recordings used for the analysis of /s/ and /Êƒ/ in the following words: Tasse, Tasche. The data was recorded using SpeechRecorder and automatically segmented using MAUS, followed by a manual correction of the target phoneme(s). The database is distributed as an emuR compatible database (emuDB format). Version 3 : removed all derived spectra files *.dft from corpus.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/CI_2_Sibilants/CI_2_Sibilants.3.phpdeuDissertation Data Dr. Veronika Neumeyer: Vowel Production in Cochlear Implant PatientsThe CI_2 corpora contain German speech recordings of 48 cochlear implant users (CI) and 48 speakers without hearing impairment (control group, KG). The data were analyzed in Veronika Neumeyers dissertation Akustische Analysen der Sprachproduktion von CI-TrÃ¤gern (2015). CI_2_Vowels contains recordings used for the analysis of sevel long, lexically stressed vowels in the words Taten, stetig, Toter, Stute, tÃ¶ten, TÃ¼te and kriegen. The data was recorded using SpeechRecorder and automatically segmented using MAUS, followed by a manual correction of the target phoneme(s). The database is distributed as an emuR compatible database (emuDB format). Version 3 : removed derived f0 analysis files *.sf0 from speech corpus.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/CI_2_Vowels/CI_2_Vowels.3.phpdeuBAS SC10The SC10 corpus contains read and non-prompted German and mother tongue speech of 70 different speakers from 17 mother tongues (L1) in a variety of speaking styles e.g. reading, retelling, free talk etc. Starting from version 1.5 (BAS CLARIN repository version 3), the corpus is distributed as an emuDB. BAS CLARIN repository version 4 is identical to version 3 with the exception of a bugfix in the emuDBs MAU level.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SC10/SC10.4.phpdeuSchweizer JugendspracheRecordings of adolescent pupils in Switzerland.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/CH-Jugendsprache/CH-Jugendsprache.1.phpgswBAS VERIF1DEThe VERIF1DE database is a subset of the VERIDAT speaker verification database collected by T-Nova. VERIDAT contains additional items and re-recordings of missing, corrupted, or otherwise unusable files in VERIF1DE. Please refer to the file DESIGN.PDF in the documentation package of this corpus for a detailed description of VERIF1DE. Users of this speech corpus are required to report any scientific publications based on these data to Felix Burkhardt (Felix.Burkhardt@telekom.de).https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/VERIF1DE/VERIF1DE.1.phpdeuAudioatlas Siebenbuergisch-Saechsischer DialekteLarge set of 2274 recordings (approx. 360h) of spoken dialectal German (Saxonian) recorded in Transilvania (Romania) in approx. 250 different locations. This up-to-now unpublished material has been collected on analog tape in the 1960s and 70s by different linguists based at the universities of Bukarest, Hermannstadt and Klausenburg. Later these tapes have been digitized, and in 2009 - with the kind support of Prof. Dr. Stefan Sienerth, director of the Institut fÃ¼r deutsche Kultur und Geschichte SÃ¼dosteuropas (IKGS) transferred to the Institute for Romance Studies (Prof. Thomas Krefeld; http://www.romanistik.uni-muenchen.de/personen/professoren/krefeld/index.html) and the LMU Center for Digital Humanities (IT-Gruppe Geisteswissenschaften [ITG; http://www.itg.lmu.de]; Dr. Stephan LÃ¼cke, Emma Mages) respectively, both of the Ludwig-Maximilians-UniversitÃ¤t MÃ¼nchen (LMU). The corpus comprises different recording strategies and discourse types: on the one hand the classic German Wenker sentences, on the other hand also fairy tales, song texts and free story-telling. Insofar, the corpus not only provides historical linguistic data but also input for ethnographical and historical disciplins. Since the age of the informants varies over a large range, this gives another dimension reflected in the metadata of the corpus. Further corpus features: geo reference of all recording sites; phonetic transcription of Wenker sentence recordings; orthographic transcription of spontaneous speech recordings (approx. 450.000 words); (partly) phonetic transcription of spontaneous speech; semantic labelling (ontology); extension to middle Bavarian recordings from the area Wassertal/Oberwischau. The ASD corpus can also be accessed at http://www.asd.gwi.uni-muenchen.de/, a dedicated website providing numerous kinds of tools, analytic approaches and visualisation. In 2016 the present corpus version was created at the BAS CLARIN center of the Ludwig-Maximilians-UniversitÃ¤t MÃ¼nchen for indefinite achivation and distribution.+Umfangreiche Tondokumentation siebenbÃ¼rgisch-sÃ¤chsischer Dialekte mit insgesamt um Ã¼ber 360 Stunden gesprochener Sprache aus ca. 250 verschiedenen Ortschaften, gespeichert in insgesamt 2274 Audiodateien. Dieses einzigartige, bislang unverÃ¶ffentlichte Material wurde im Wesentlichen in den spÃ¤ten 60er und frÃ¼hen 70er Jahren des letzten Jahrhunderts von Sprachforschern verschiedener rumÃ¤nischer UniversitÃ¤ten (Bukarest, Hermannstadt, Klausenburg) erhoben und auf TonbÃ¤ndern aufgenommen; daraus wurden die digitalen Versionen mit durchweg guter akustischer QualitÃ¤t erstellt. Diese digitalen Versionen wurden im Jahr 2009 unter Vermittlung und mit UnterstÃ¼tzung von Prof. Dr. Stefan Sienerth, dem damaligen Direktor des Instituts fÃ¼r deutsche Kultur und Geschichte SÃ¼dosteuropas (IKGS), an die LMU Ã¼bergeben. Die Dokumentation umfasst unterschiedliche Erhebungsstrategien und Diskursformen: Einerseits wurden in den meisten Orten die â€˜klassischenâ€™ Wenker-SÃ¤tze der germanistischen Dialektologie abgefragt; andererseits sind auch MÃ¤rchen, Lieder und â€“ vor allem â€“ zahlreiche, mehr oder weniger freie ErzÃ¤hlungen vertreten. Deshalb werden neben sprachwissenschaftlichen auch ethnographische und zeitgeschichtliche Interessen bedient. Mit dem sehr unterschiedlichen Alter der Informanten kommt eine weitere Dimension der Variation ins Spiel, die ebenfalls gezielt abgefragt werden kann. Weitere Korpus-Merkmale sind: Georeferenzierung der Erhebungsorte; phonetische Transkription (IPA) der Wenkersatzaufnahmen; standardnahe, orthographische Transkription spontansprachlicher Texte (475.000 WÃ¶rter, entsprechend Ã¼ber 300 Normseiten); phonetische Transkription von Spontansprache, inhaltliche TiefenerschlieÃŸung durch Verschlagwortung (â€žOntologieâ€œ), geographische und sprachliche Ausweitung Mittelbayerisches Material aus dem Wassertal/Oberwischau. Die vorliegende Version des ASD Korpus wurd im Jahre 2016 am BAS CLARIN center der Ludwig-Maximilians-UniversitÃ¤t MÃ¼nchen archiviert.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/ASD/ASD.1.phpronbardeuundBAS HEMPELHempels Sofa is a collection of more than 3900 spontaneous speech items recorded as extra material during the German SpeechDat-II project. Speakers were asked to report what they had been doing during the last hour: Was haben Sie in der letzten Stunde gemacht?. This item was recorded as the last item of the recording session. Speakers had become acquainted with the recording procedure and they were quite relaxed because they knew that this item was the last to be recorded. This resulted in quite natural, colloquial speech, sometimes with marked regional accent. The corpus collection is described in more detail in the LREC2002 paper Three New Corpora at the Bavarian Archive for Speech Signals - and a First Step Towards Distributed Web-Based Recording by C. Draxler and F. Schiel. This paper is contained in this database in file DOC/BASCORPO.PDF; it also contains links to related SpeechDat documents. Note: the name of the corpus refers to the German proverbial phrase wie bei Hempels unterm Sofa. This phrase is often used to indicate that something is not well cleaned-up -- not dirty, just in its everyday state when one is not expecting visitors. I thought the phrase to be appropriate for this data collection because quite often when listening to the recordings one gets the impression of sitting next to the speaker on the sofa in a common living room. Note: Starting from version 2.0 (CLARIN Repository Version 3), HEMPEL is distributed as an emuR compatible emuDB. Version 2.1 (CLARIN Repository 4) is distributed without the MAU (phonetic segmentation) tier, as it was found to lack in accuracy.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/HEMPEL/HEMPEL.4.phpdeuBAS TAXIThe TAXI dialog database was created in June 2001 in collaboration with the DFKI, Saarbruecken. TAXI contains 86 recorded dialogues between a cab dispatcher and a client recorded over public phone lines (network and GSM). The dispatcher always speaks German, while the clients always speaks English. Starting from version 2.5 (BAS CLARIN Repository version 3), TAXI is distributed as an emuR compatible emuDB.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/TAXI/TAXI.3.phpdeuengBAS ZIPTELThe ZipTel telephone speech database contains recordings of people applying for a SpeechDat prompt sheet via telephone. For the SpeechDat data collection, calls for participation were published in phone, the customer magazine of the mobile telephone provider e-plus, and in numerous newspapers all over Germany. In these calls, a telephone number was given where callers could order a SpeechDat prompt sheet. The calls were recorded by an automatic telephone server; callers were asked to provide name, address and telephone number. The ZipTel telephone speech database consists of 1957 recording sessions with a total of 7746 signal files. A recording session corresponds to one phone call, each signal file contains a single recorded utterance from the recording session. For privacy reasons, only a subset of the recorded signal files are contained in the databases: Streetnames (z2), ZIP-Codes (z3), Citynames (z4) and Telephone numbers (z5). Starting from version 1.3 (BAS CLARIN Repository version 3), ZIPTEL is distributed as an emuDB.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/ZIPTEL/ZIPTEL.3.phpdeuBAS Verbmobil EmotionThis database contains speech signals of dialogues in which a subject was recorded during a conversation via a spontaneous speech translation system. The response of the system was designed to invoke emotions (e.g. anger) in the subjects. It is part of the larger Verbmobil 2 speech data collection. Starting from BAS Clarin Respository version 2, the database is also distributed as an emuR comptatible emu database.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/VMEmo/VMEmo.2.phpdeuBAS RVG1_CLARINThe corpus is a collection of more than 500 speakers of different dialect regions of Germany. The recordings were made using four different microphones (two in low and two in high quality) and consist of single digits, connected digits, phone numbers, phonetically balanced sentences, computer command phrases prompted on a screen, and 1 min spontaneous speech (monologue). The speakers were recorded in normal office environments. The backround noise was limited to the usual noise in office environment, eg. door slam, backround crosstalk, phone ringing, paper rustle, PC noise, etc. Starting from version 4.2 (BAS CLARIN Repository version 3), this corpus is distributed as an emuR compatible emuDB.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/RVG1_CLARIN/RVG1_CLARIN.3.phpdeuBAS Regional Variants of German - JuvenilesThe RVG-J Corpus (Regional Variants of German - Junior) was recorded in 2001 at the Institute of Phonetics and Speech Communication at the University of Munich, Germany. The corpus contains both read and non-scripted German utterances. It comprises the original RVG prompts (telephone numbers, sentences, commands, digits, etc.) plus spellings, date and time expressions, and free form responses to questions, e.g. What are you wearing?, How did you get here?, etc. The speakers were adolescents between 13 and 20 years of age, recruited in public schools in Munich and the suburbs. More than 95% of the speakers have German as their mother language, and almost all of them attended school in Bavaria; 89 of them were male and 93 female. Speakers younger than 18 years were required to provide a waiver signed by their parents stating that they were allowed to participate in the recordings. The corpus can be used for the training of speech recognisers or analyses of adolescent speech. Starting from version 1.5 (CLARIN Repository version 2), RVG-J is also distributed as an emuDB.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/RVG-J/RVG-J.2.phpdeuBAS Strange Corpus 2 NoisesThe corpus SC2 contains read speech of 10 different speakers with screen prompted automobil diagnosis phrases recorded under real conditions in two different car maintenance halls. The language is German. All speakers are male native Germans and have never participated in such a task before. They are all experts in the field of car diagnosis. Each speaker has spoken 800 3-7 word utterances derived from 100 different sentences (see sc2_ort.txt) resulting in a total of 8000 utterances. Starting from version 2.4 (BAS CLARIN repository version 2), the corpus is distributed as an emuDB. In BAS CLARIN repository version 4, the emuDBs MAU tier was deleted, due to issues with segmentation quality stemming from the background noise in this corpus.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SC2/SC2.4.phpdeuDissertation Data Dr. Veronika Neumeyer: Sibilant Production in Cochlear Implant Patients (diachronic data)The CI_3 corpora contain diachronic speech recordings from three cochlear implant (CI) users which were analysed in the long term study part of Veronika Neumeyers PhD Thesis Akustische Analysen der Sprachproduktion von CI-TrÃ¤gern (2015). Please note that the corpus is distributed as four separate subcorpora (CI_3_Cluster, CI_3_Sibilants, CI_3_VOT, CI_3_Vowels). For data used in the corresponding synchronic study, please refer to the CI_2 corpora. CI_3_Sibilants contains recordings used for the analysis of /s/ and /Êƒ/ in the following words: Tasse, Tasche. The data was recorded using SpeechRecorder and automatically segmented using MAUS, followed by a manual correction of the target phoneme(s). The database is distributed as an emuR compatible database (emuDB format).https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/CI_3_Sibilants/CI_3_Sibilants.1.phpdeuDissertation Data Dr. Veronika Neumeyer: Vowel Production in Cochlear Implant Patients (diachronic data)The CI_3 corpora contain diachronic speech recordings from three cochlear implant (CI) users which were analysed in the long term study part of Veronika Neumeyers PhD Thesis Akustische Analysen der Sprachproduktion von CI-TrÃ¤gern (2015). Please note that the corpus is distributed as four separate subcorpora (CI_3_Cluster, CI_3_Sibilants, CI_3_VOT, CI_3_Vowels). For data used in the corresponding synchronic study, please refer to the CI_2 corpora. CI_3_Vowels contains recordings used for the analysis of sevel long, lexically stressed vowels in the words Taten, stetig, Toter, Stute, tÃ¶ten, TÃ¼te and kriegen. The data was recorded using SpeechRecorder and automatically segmented using MAUS, followed by a manual correction of the target phoneme(s). The database is distributed as an emuR compatible database (emuDB format). Version 2: Fixed a bug in the emuR DB config.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/CI_3_Vowels/CI_3_Vowels.2.phpdeuNautilus Speaker CharacterizationNSC contains scripted, semi-spontaneous, and spontaneous human-human dialogs. In total, 300 speakers of German without noticeable accent participated and were recorded in an acoustically-isolated room. Interactions between speakers and their interlocutor are provided in separate mono files, accompanied by timestamps and tags that define the speakers turns. The speech corresponding to one of the semi-spontaneous dialogs was labeled with respect to perceived interpersonal speaker characteristics and naive voice descriptions. These labels are found alongside the documentation. Resource ISLRN: 157-037-166-491-1.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/NSC/NSC.1.phpdeuBAS CLIPS_MT_MANUALCLIPS_MT_MANUAL is a sub-corpus of the original Italian CLIPS corpus (Corpora e Lessici dellItaliano Parlato e Scritto) that covers only 15 maptask dialogues recorded in 15 locations by local speaker pairs. The BAS has decided to bring forward another edition of this data as we found a large number of errors (formal and content) in the annotation and signal files of the original corpus that prevented our colleagues from performing proper phonetic investigations. To make published results on these (corrected) data replicable for the scientific community, BAS decided - with the kind permission of the CLIPS copyright holders - to ingest this part of CLIPS in the BAS CLARIN repository under the name CLIPS_MT_MANUAL (MT = map task, MANUAL indicates manual annotation), thereby making it available to all European academic researchers. In a nutshell, this corpus contains 3228 inspected and partially repaired WAV signal files, each containing one dialogue turn (*.wav), 3228 corrected original CLIPS annotation files (*.acs, *.phn, *.std, *.wrd), 3228 BAS Partitur files containing the annotation tiers ORT, KAN and SAP (*.par), 3228 EMU database annotation files (*.vot, *.hlb) covering 30 maptask dialogues performed by 30 speakers (each speaker pair performing two different map tasks) recorded in 15 different locations in Italy in 2000-2004. Starting with version 1.2, the corpus is also provided in an emuR compatible json format (*_annot.json).https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/CLIPS_MT_MANUAL/CLIPS_MT_MANUAL.3.phpitaSpoken production of gender-neutral nouns in GermanThis corpus examines the pronunciation of different genderneutral forms in German. Various source texts were used, like newspaper articles, websites, etc.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SprecherInnen/SprecherInnen.1.phpdeuDissertation Data Dr. Veronika Neumeyer: Voice Onset Time in Cochlear Implant Patients (diachronic data)The CI_3 corpora contain diachronic speech recordings from three cochlear implant (CI) users which were analysed in the long term study part of Veronika Neumeyers PhD Thesis Akustische Analysen der Sprachproduktion von CI-TrÃ¤gern (2015). Please note that the corpus is distributed as four separate subcorpora (CI_3_Cluster, CI_3_Sibilants, CI_3_VOT, CI_3_Vowels). For data used in the corresponding synchronic study, please refer to the CI_2 corpora. CI_3_VOT contains recordings used for the analysis of voice onset time in /t/ in the word teilen. The data was recorded using SpeechRecorder and automatically segmented using MAUS, followed by a manual correction of the target phoneme(s). The database is distributed as an emuR compatible database (emuDB format).https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/CI_3_VOT/CI_3_VOT.1.phpdeuBAS FORMTASKFORMTASK is a telephone speech database of prompted descriptions of typical forms found in everyday life. The forms are - Berlin public transport ticket - Invoice - Austrian parking ticket - Newsstand receipt - Money transfer form To elicit a description of the forms, the following four questions were asked: 1) What type of form is this? 2) What date is on the form? 3) What amout is on the form? 4) Where is the amount written on the form? The speakers saw the form in black and white print on their personal prompt sheet on paper. Starting from version 2.1, FORMTASK is distributed as an emuR compatible emuDB corpus.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/FORMTASK/FORMTASK.2.phpdeuBAS SmartWeb Handheldunspecifiedhttps://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SHC/SHC.2.phpdeuGesprochenes Wortkorpus fÃ¼r Untersuchungen zur auditiven Verarbeitung von Sprache und emotionaler ProsodieWaSeP contains recordings of one female and one male speaker, both professional actors, uttering single German nouns and pseudowords in multiple emotional prosodies. This edition improves the segmentation of the phonetic annotation, adds Praat TextGrid files and removes a few irregular items.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/WaSeP/WaSeP.2.phpdeuBAS SmartWeb VideoThe SMARTWEB UMTS data collection was created within the publicly funded German SmartWeb project in the years 2004 - 2006. It comprises a collection of user queries to a naturally spoken Web interface with the main focus on the soccer world series in 2006. The recordings include 156 field recordings using a hand-held UMTS device (one person, SmartWeb Handheld Corpus SHC), 99 field recordings with video capture of the primary speaker and a secondary speaker (SmartWeb Video Corpus SVC) as well as 36 mobile recordings performed on a BMW motorbike (one speaker, SmartWeb Motorbike Corpus SMC). An addendum DVD-R (dvd-fau, vol 24) contains additional data derived from the basic SVC corpus data provided by FAU Erlangen. Starting from version 3.6 (CLARIN Repository version 4), the transcribed parts of the SVC audio recordings are distributed as an emuDB.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SVC/SVC.4.phpdeuMultiCHannel Articulatory database: EnglishThe MOCHA database was compiled as part of the Engineering and Physical Sciences Research Council grant number:GR/L78680 : Speech recognition using articulatory data. It features a set of 460 short sentences designed to include the main connected speech processes in English (e.g. assimilations, weak forms ...). All recordings made in the same sound damped studio at the Edinburgh Speech Production Facility based in the department of Speech and Language Sciences, Queen Margaret University College, UK. The database contains audio files, laryngograph waveforms, electromagnetic articulograph (EMA) tracks and electropalatograph (EPG) tracks. It is distributed as an emuR compatible EMU database. Conversion into this format was done at the Bavarian Archive for Speech Signals. The original database is available here: http://www.cstr.ed.ac.uk/research/projects/artic/mocha.htmlhttps://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/MOCHA/MOCHA.1.phpdeuBAS Infrastructures for Technical Speech ProcessingSpeech synthesis using concatenative techniques is maturing to a point where standard procedures are being implemented in a variety of products. However, because of the considerable costs most small and medium-sized companies as well as university labs cannot afford to produce the required speech resources on their own. Although there are some public domain German diphone voices available for research purposes (e.g. MBROLA) there is definitely a lack of publicly available synthesis resources. The BITS synthesis corpus (recorded and) produced by BAS fills the obvious gap.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/BITS/BITS.1.phpdeuThe Zurich Tangram Corpus - UZH EditionThis corpus contains tasks, where one subject (the instructor) describes different Tangram figures to another subject (the receiver) so that the receiver can recreate the same order of figures that the instructor has in front of them. The subjects initially dont know each other and work together to solve these tasks in three consecutive sessions. This edition features the complete recordings, but lacking phone and word segmentation. Subjects audio tracks are combined into stereo files. If you would like just the transcribed segments with separate files for the subjects or want the word and phone segmentation see corpus ZTC_BAS.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/ZTC_UZH/ZTC_UZH.1.phpdeuThe Zurich Tangram Corpus - BAS EditionThis corpus contains tasks, where one subject (the instructor) describes different Tangram figures to another subject (the receiver) so that the receiver can recreate the same order of figures that the instructor has in front of them. The subjects initially dont know each other and work together to solve these tasks in three consecutive sessions. This edition only features the transcribed segments, not those in between, and uses separate files for the subject. If you would like the complete recordings with both subjects combined in a file (but missing the word and phone segmentation) see corpus ZTC_UZH.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/ZTC_BAS/ZTC_BAS.2.phpdeuThe Karl-Eberhard-Corpus of spontaneously spoken conversations in Southern GermanThe KEC contains 79 speakers of Southern German. Two speakers, usually acquainted with each other, had an one hour long conversation in separate booths. Manual annotation at the word level is provided, automatic annotation at the segment level as well as an automatic morphological tagging is added.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/KEC/KEC.1.phpdeuBAS SI100The corpus contains read speech of 101 different speakers (50 female, 50 male, 1 unknown). Each speaker has read approx. 100 sentences from either the SZ subcorpus or the CeBit subcorpus. The language is German. The subcorpus SZ contains 544 sentences from newspaper articles (Sueddeutsche Zeitung). The subcorpus CeBit contains 483 sentences from newspaper articles about the CeBit 1995. Each subcorpus is divided into 5 parts of approx. 100 utterances each. Every speaker read only one part of one subcorpus (with some exceptions), thus resulting in a total of 10.387 recorded utterances (31,5 h of speech). The recording took place at the Institut fuer Phonetik, University of Munich, Germany in 1995.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SI100/SI100.3.phpdeuBAS SI1000The corpus contains read speech of 10 different speakers. Each speaker has read approx. 1000 sentences from a German news paper corpus, thus resulting in a total of approx. 10000 recorded utterances. The recording took place at the Institut fuer Phonetik, University of Munich, Germany in 1994.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SI1000/SI1000.1.phpdeuBAS AbsolventInnenThe Absolventinnen corpus has been recorded during the summer semester 2019 within the scope of the Masterâ€™s thesis â€œAussprachestrategien geschlechtsneutraler Nomina im Deutschenâ€ by Korbinian Slavik at the Institute of Phonetics and Speech Processing (IPS), LMU Munich. It is the proceeding corpus of the SprecherInnen corpus, which is also part of the BAS CLARIN Repository. Its purpose is to provide data for examining the pronunciation of gender-neutral forms in German in a main study. To this date, there is a wide variety of written forms in order to express different biological genders in one word. These forms include the asterisk *, underscore _, and internal I before the suffix, e.g., Absolvent*innen, Absolvent_innen, and AbsolventInnen. To our knowledge, there is no standardized pronunciation norm for these innovative writings which is why the collected data is of special interest to phoneticians and phonologists. Pronunciation strategies include a variety of morphosyntactic expansions, lengthening of [Éª] in the suffixes, shift of lexical stresses, pauses, and glottal stops before suffix. Young speakers tend to use phonetic markers more often than older people, which could be an indicator for a potential change in progress. Additionally, participants tend to make more mistakes in sentences with gender neutral words. The recordings took place at the IPS in the Munich region. 56 texts were recorded from 40 speakers. The texts came from newspapers, websites, administration offices, social services, etc., and were modified to contain either one of the three gender-neutral forms or the extended form. Each of the speakers read the 56 sentences, with target words, 25Â % each, asterisk, underscore, uppercase-I or the feminine plural-form in a counterbalancing measures design. Filler sentences for this study are not a part of the corpus but will be part of further investigations. That means, that there are 56 recordings per session. After the recording session participants filled in an online questionnaire, which will be part of the metadata. The participants were males and females, most of them from Bavaria or other parts of Germany, 20 students and 20 retired persons. All in all, there are 2240 recordings, all of which were transcribed orthographically, phonemically, and phonetically. In 2019, the present corpus version was created at the BAS CLARIN center of the University of Munich (LMU) for indefinite archivation and distribution.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/AbsolventInnen/AbsolventInnen.1.phpdeuBAS Verbmobil 2Verbmobil 2 contains the speech of 401 speakers participating in 810 recordings. The emotional tagged recordings are not part of this edition but are collected inthe corpus BAS VMEmo. The total VM2 corpus amounts to 17.6GB of data containing 58961 conversational turns distributed on 39 CD-R. VM2 contains dialogs in German, English, Japanese and mixed language pairs (partly with interpreter). The domain is appointment scheduling, travel planing, leisure time planing. Starting from version 3, the corpus is also available in emuR compatible emuDB format (see annotation files ending in *_annot.json).In Version 4 the accompanying CLARIN Documentation has been extended by the .rpr files for each session.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/VM2/VM2.4.phpdeujpnengNatural Media Motion-Capture Corpus The Natural Media Motion Capture Corpus (NM-MoCap-Corpus) originates from a case study recorded in Aachen (Germany) in 2011 for a theory of Gesture Form Analysis (Hassemer & McCleary, in press, The multidimensionality of pointing. Gesture; Hassemer, 2016, Towards a theory of Gesture Form Analysis), which aimed at eliciting object descriptions containing gestural information about stimulus objects. Gesture Form Analysis bases on differentiating between the physical configuration of the articulating body part (articulator form) and the spatial information that an observer abstracts from articulator form (gesture form), for example by profiling specific parts of an articulator. Of particular interest were differences in the participants depiction of size information versus size and shape information. The corpus consists of data from 18 participants, whose task was to describe nine objects each to an experimenter, without using everyday vocabulary about forms, sizes or objects. The participants were recorded on audio and several video cameras, and their hand movements were recorded using an optical VICON motion capture system. ELAN annotations for gestural holds displaying size or shape information were generated semi-automatically from the motion capture data. Each participants sessions contains ten combined motion capture and video recordings (nine object descriptions and one calibration task). Each motion capture recording consists of three video files and two data files. For each object description one ELAN annotation file was produced. Furthermore, each participants data contain one HD video of the entire session. In total the corpus consists of: 557 video files (*.mp4) (one file missing) 720 annotation files (*.eaf) 162 motion capture data files (*.csv) The NM-MoCap-Corpus was curated for CLARIN as part of Curation Project 1 Editing and Integration of multimodal resources in CLARIN-D by CLARIN-D Working Group 6 Speech and Other Modalities. BAS CLARIN Repository version 2 contains manual annotations of articulator profile, gesture type, meaningful motion, difficult to code or not of both hands, according to gesture form analysis v.06 (Hassemer & McCleary, in press, The multidimensionality of pointing. Gesture; Hassemer, 2016, Towards a theory of Gesture Form Analysis).https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/NM-MoCap-Corpus/NM-MoCap-Corpus.3.phpdeuPhonDat 1The corpus contains read speech of 201 different speakers. Each speaket read a subcorpus of 450 different sentence equivalents (including alphanumericals and two shorter passages of prose text); 8 speakers read the whole sentence corpus; 40 speakers read the subcorpora BR and MR; 112 speakers read 70 utterances of the rest corpus, including alphabet, numbers 0 to 12 and stories. The speakers were recorded at four different sites in Germany (University of Kiel, University of Bonn, University of Bochum, University of Munich). The language is German. The corpus contains a total of 21587 recorded utterances. Starting from version 4.1 (BAS CLARIN repository version 3), this corpus is available as an emuDB.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/PD1/PD1.4.phpdeuOH.D Colonia Dignidad. Ein chilenisch-deutsches Oral History-ArchivDas Online-Archiv â€žColonia Dignidad. Ein chilenisch-deutsches Oral History-Archivâ€œ enthÃ¤lt Interviews mit Zeitzeuginnen und Zeitzeugen einer deutschen Sektensiedlung im sÃ¼dlichen Chile. Zwischen 1961 und 2005 wurden die Sektenmitglieder, eigene und chilenische Kinder isoliert, indoktriniert, ausgebeutet, gequÃ¤lt und sexuell missbraucht. Â In den 1990er Jahren Ã¼bte SektenfÃ¼hrer Paul SchÃ¤fer sexuelle Gewalt gegen chilenische Jungen aus. WÃ¤hrend der chilenischen Diktatur 1973 bis 1990 wurden Oppositionelle dort gefoltert und ermordet.Â  Um Zugang zu den vollstÃ¤ndigen Interviews, Fotos und ErlÃ¤uterungen zu erhalten, mÃ¼ssen Sie sich registrieren. Dabei sind die Nutzungsbedingungen zu beachten, insbesondere die PersÃ¶nlichkeitsrechte der Interviewten. Auf der begleitenden Webseite finden Sie weitere Informationen zum Archiv, zur Colonia Dignidad und zum Interviewprojekt.Â  Das Archiv befindet sich noch im Aufbau.Â https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/ohd_cdoh_001/ohd_cdoh_001.1.phpspadeuBAS Edition of German Distant Speech Data Corpus 2014/2015 General information: The corpus contains read German speech of 179 different speakers (50 female, 129 male). Each speaker has read randomly selected sentences from four text collections: Wikipedia, the Europarl Corpus,a list of German Command/Control sentences, a corpus of web-crawled sentences that represent direct speech. The recording took place at the Language Technology and Telecooperation labs, TU-Darmstadt, Germany in 2014-2015. The task for the speaker was to read fluently and precise (no dialectal variation). Up to 5 microphones were recorded in parallel: Kinect 1 Beamformed Audio signal through Kinect SDK, Kinect 1 Direct Access as normal microphone, Internal Realtek Mic of Asus PC - near noisy fan, Samson C01U, Yamaha PSG-01S. Distance to mouth for all microphones was approx. 100cm. Room: dry acoustics (quiet office), no noise. Sampling rate: 16kHz, resolution: 16 Bit. The speech data was collected in a controlled environment (same room, same microphone distances, etc.). Each recording has a xml transcription file that also includes speaker meta data. The data is curated (manually checked and corrected), to reduce errors and artefacts. The speech data is divided into three independent data sets: Training / Test / Dev, Test and Dev contains new sentences and new speakers that are not part of training set, in order to assess model quality in a speaker-independent open-vocabulary setting. Information about the data collection procedure: (1) Train set (recordings in 2014): Sentences were randomly chosen from German Wikipedia and Europarl Corpus, to be read by the speakers. The Europarl corpus (Release v7) is a collection of the proceedings of the European Parliament between 1996 and 2011, generated by Philipp Koehn (Europarl: A Parallel Corpus for Statistical Machine Translation, Philipp Koehn, MT Summit 2005, http://www.statmt.org/europarl/). As third data set, German command and control sentences, were manually specified and would be typical for a command and control setting in living rooms. (2) Test/dev set (recordings in 2015): Additional sentences from the German Wikipedia and from the Europarl Corpus have selected for the recordings. Additionally, we collected German sentences from the web by crawling the German top-level-domain and applying language filtering and deduplification. Exclusively sentences starting with quotation marks were selected and randomly sampled. The three text sources are represented with approximately equal amounts of recordings in the test/dev set.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/DIALOGPLUS/DIALOGPLUS.1.phpdeuSmartKom MobilThis corpus contains multi modal recordings of 73 actors who use the SmartKom system. SmartKom Mobil is a portable PDA equipped with a net link and additional intelligent communication devices. Naive users were asked to test a prototype for a market study not knowing that the system was in fact controlled by two human operators. They were asked to solve two tasks in a period of 4,5 min while they were left alone with the system. The instruction was kept to a minimum; in fact the user only knew that the system is able to understand speech, gestures and should more or less communicate like a human. Experiments were not performed in the field but rather in a studio-like environment. Background noise was played back artificially and the users did not carry the PDA in their hand but rather used a much smaller version of the SIVIT projection plane (to simulate a PDA display) and a pen as a pointing device. Speakers were speaking to a headset microphone. Version 1.4 (BAS CLARIN Repository Version 3: Updated Documentation) (BAS CLARIN Repository Version 4: re-coded mimic camera DV videos into a modern DV codec)https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SK-Mobil/SK-Mobil.4.phpdeuOHD Deutsches GedÃ¤chtnisIm Â Archiv â€žDeutsches GedÃ¤chtnisâ€œ werden subjektive Erinnerungszeugnisse Â wie lebensgeschichtliche Interviews, Autobiographien, TagebÃ¼cher und Briefsammlungen ganz unterschiedlicher Menschen archiviert, die einen Â Bezug zu gesellschaftspolitischen Ereignissen in Deutschland bzw. zur deutschen Geschichte haben. Sie stammen sowohl aus dem In- als auch aus Â dem Ausland. Dementsprechend sind die Dokumente Ã¼berwiegend in deutscher Â Sprache, jedes fÃ¼nfte Interview allerdings in einer anderen Sprache. Die Â Interviews wurden seit den frÃ¼hen 1980er-Jahren im Rahmen von Â zeitÂgeschichtlichen ForschungsÂprojekten des Instituts und seiner VorlÃ¤uferprojekte gefÃ¼hrt. Hinzu kommen biographische Interviews aus Â Forschungen Dritter unterschiedlicher Disziplinen, die ihre Sammlungen Â dem Archiv zur weiteren wissenschaftlichen Nutzung Ã¼berlassen haben. Â Neben Interviews werden auch schriftliche Erinnerungszeugnisse Â archiviert wie AutoÂbiographien, FamilienÂchroniken, TagebÃ¼cher und Â BriefÂsammlungen.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/OHD_adg_001/OHD_adg_001.1.phpdeuSmartKom PublicThis corpus contains multi modal recordings of 86 actors who use the SmartKom system. SmartKom Public is comparable to a traditional public phone booth but equipped with additional intelligent communication devices. Naive users were asked to test a prototype for a market study not knowing that the system was in fact controlled by two human operators. They were asked to solve two tasks in a period of 4,5 min while they were left alone with the system. The instruction was kept to a minimum; in fact the user only knew that the system is able to understand speech, gestures and even mimical expressions and should more or less communicate like a human.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SK-Public/SK-Public.5.phpdeuSmartKom HomeThis corpus contains multi modal recordings of 65 actors who use the SmartKom system. SmartKom Home should be an intelligent communication assistant for the private environment. Naive users were asked to test a prototype for a market study not knowing that the system was in fact controlled by two human operators. They were asked to solve two tasks in a period of 4,5 min while they were left alone with the system. The instruction was kept to a minimum; in fact the user only knew that the system is able to understand speech, gestures and even mimical expressions and should more or less communicate like a human. Version 1.3 (BAS CLARIN Repository Version 3): fixed duration column in BPF files in tiers USH, USM and OCC; duration in samples was too high by exactly 1. (BAS CLARIN Repository Version 4): edited Documentationhttps://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/SK-Home/SK-Home.5.phpdeuOH.D Interview-Archiv Eiserner VorhangDieses Interview-Archiv beinhaltet 16 Video-Interviews mit AngehÃ¶rigen, Freunden und MitflÃ¼chtlingen von DDR-BÃ¼rgerinnen und DDR-BÃ¼rgern, die an der innerdeutschen Grenze, in der Ostsee und an der â€žverlÃ¤ngerten Mauerâ€œ des Eisernen Vorhangs ums Leben kamen. Im Mittelpunkt dieser Interviews steht die Erinnerung an die Menschen, die dem Grenzregime der DDR und der anderen Ostblockstaaten zum Opfer fielen. Das Interview-Archiv beinhaltet zudem lebensgeschichtliche Interviews mit ehemaligen DDR-BÃ¼rgerinnen und DDR-BÃ¼rgern, die durch Ausreiseantragstellung oder gescheiterten Fluchtversuch der DDR-WillkÃ¼rjustiz ausgesetzt waren, sowie einigen, denen eine erfolgreiche Flucht gelang. Um Zugang zu den vollstÃ¤ndigen Interviews, Fotos und ErlÃ¤uterungen zu erhalten, mÃ¼ssen Sie sich registrieren. Dabei sind die Nutzungsbedingungen zu beachten, insbesondere die PersÃ¶nlichkeitsrechte der Interviewten. Weitere Informationen zum Projekt finden Sie auf der begleitenden Webseite.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/ohd_ev_001/ohd_ev_001.1.phpdeuOH.D Erlebte Geschichte - Freie UniversitÃ¤t BerlinDie Geschichte der Freien UniversitÃ¤t Berlin ist einzigartig. Sie wurde und wird geprÃ¤gt durch die Menschen, die dort arbeiten, lehren und studieren.Â  Das Archiv â€žErlebte Geschichte - Freie UniversitÃ¤tâ€œ dokumentiert lebensgeschichtliche Interviews mit ehemaligen, aber auch noch aktiven UniversitÃ¤tsangehÃ¶rigen, deren Biographien mit der Geschichte der UniversitÃ¤t eng verwoben sind. In den ErzÃ¤hlungen werden Ereignisse und Entwicklungen lebendig sowie Motive und HintergrÃ¼nde erkennbar.https://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/ohd_eg_001/ohd_eg_001.1.phpdeuTEST-DUMMYhttps://clarin.phonetik.uni-muenchen.de/BASRepository/Public/Corpora/TEST-DUMMY/TEST-DUMMY.1.phpdeu