End User License Agreements (EULAs):


Bavarian Archive for Speech Signals Webservices: Terms of Use for Academic Institutions

Date 2018-12-26

The Bavarian Archive for Speech Signals at the Ludwig-Maximilians-Universitaet Munich (BAS) provides free webservices (termed 'webservices' in the following) for members of academic institutions (termed 'user' in the following) subject to the following Terms of Use. The BAS may amend these Terms of Use at any time by posting amended versions on this website. 

1. In order to access the webservices, the user must be a member of an academic institution (i.e. can be authenticated via a Shiboleth authentification service provided by the academic home institution of the user; in case your institution does not provide such a service, please obtain a CLARIN IDP account at https://user.clarin.eu/user/register). The BAS reserves the right to grant or deny access to the webservices or to terminate running processes at any time.
2. The results of the webservices may be used for non-profit research purposes only. If you intend to utilize webservice results for commercial purposes or to access webservices from a commercial host, please contact the BAS prior to any usage to obtain a BAS user license. The BAS will apply reverse IP mapping to determine the IP address of hosts calling/accessing the webservices to verify non-profit usage. 
3. Uploaded Data
The user must be able to present proof that they have the rights to use all data that they upload. The user entitles the BAS to store the uploaded data, to process them, to store all intermediate and final results of the process and to remove data that have been stored during processing. All uploaded material will be deleted automatically after 24 hours. Uploaded data will not be forwarded to third parties, except in the case of the service 'ASR', which forwards user data to a third-party, commercial webservice provider (see details and EULAS of these third-party providers on the 'ASR' webservice page). The Terms of Use of these third-party providers differ from the Terms of Use of the BAS. The user indemnifies and will not hold the BAS responsible for any claim arising from use of these third party webservices.
4. Results
The user agrees to avoid any non-ethical usage of the webservices or of results of webservices. The copyright of the results of the webservices belongs to the user of the webservices. The BAS retains the right to store the results only for the technical purpose of providing the service. Intermediate or end results will not be exploited or reviewed in any way by BAS, and are deleted automatically after 24 hours. 
5. Monitoring
For monitoring purposes each transaction on the BAS server will be logged internally and by an external non-public monitoring service of the CLARIN consortium. Internal logging information is confidential and will not be released to third parties. External logging information will be anonymized (stripped from personal information) and made accessible to partners of the CLARIN consortium for the purpose of deriving usage statistics regarding the webservices. The user agrees to these monitoring policies.
6. The user will indemnify and will not hold the BAS responsible for any claim arising out of the use of the webservices.
7. Disclaimer: the use of the webservices is at the user's own risk. The webservices are provided on an "as is" basis. The BAS does not provide a warranty of any kind for the webservices.
8. Limitation of liability: the BAS will not be liable for any damages resulting from the use of the webservices or the use of results of the webservices; the BAS aims to provide the webservices on a 24/7 basis, but will not be liable for any damages that are caused by non-availability of the webservices for any reason.


------------------

Bavarian Archive for Speech Signals Webservices: Terms of Use for Commercial Institutions

Date 2018-12-26

The Bavarian Archive for Speech Signals at the Ludwig-Maximilians-Universitaet Munich (BAS) provides licensed webservices (termed 'webservices' in the following) to commercial institutions (termed 'user' in the following) subject to the following Terms of Use. The BAS may amend these Terms of Use at any time by posting amended versions on this website. 

1. In order to access the webservices, the user must obtain a BAS user license for the respective service, except for the service 'ASR' which may not be used by commercial institutions. BAS user licenses are time limited and amount limited (e.g. a maximum of calls per day); the user is obliged to restrict his/her usage within these limits; otherwise the BAS reserves the right to deny access to the webservices or to terminate running processes. The BAS will apply amount monitoring and reverse IP mapping to determine the IP address of hosts calling/accessing the webservices to verify usage within the contracted limits.
2. The results of the licensed webservice may be used for profit purposes except where the results of the service are traded directly to a third party (i.e. the user acts as a retailer or broker). 
3. Uploaded Data
The user must be able to present proof that he has the rights to use all data that are uploaded. The user entitles the BAS to store the uploaded data, to process them, to store all intermediate and final results of the process and to remove data that have been stored during processing. All uploaded materials will be deleted automatically after 24 hours. Uploaded data will not be forwarded to third parties.
4. Results
The user agrees to avoid any non-ethical usage of the webservices or of results of webservices. The copyright of the results of the webservices belongs to the user of the webservices. The BAS retains the right to store the results only for the technical purpose of providing the service. Intermediate or end results will not be exploited or reviewed in any way by BAS, and are deleted automatically after 24 hours.
5. Monitoring
For monitoring purposes each transaction on the BAS server will be logged internally and by an external non-public monitoring service of the CLARIN consortium. Internal logging information is confidential and will not be released to third parties. External logging information will be anonymized (stripped from personal information) and made accessible to partners of the CLARIN consortium for the purpose of deriving usage statistics regarding the webservices. The user agrees to these monitoring policies.
6. The user will indemnify and will not hold the BAS responsible for any claim arising out of the use of the licensed webservice(s).
7. Disclaimer: the use of the licensed webservice(s) is at the user's own risk. The webservices are provided on an "as is" basis. The BAS does not provide a warranty of any kind for the webservice(s).
8. Limitation of liability: the BAS will not be liable for any damages resulting from the use of the licensed webservice(s) or the use of results of the licensed webservice(s); the BAS aims to provide the webservice(s) on a 24/7 basis, but will not be liable for any damages that are caused by non-availability of the webservice(s) for any reason; the BAS will not be liable for any unauthorized use of the webservice 'ASR'.


------------------


API of BAS WebService REST Calls
================================

Note about Server Load: to avoid overloading our servers,
please consider using the following GET to check the current server load:

https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/getLoadIndicator

This call returns a number (as string): 0 : low load, 1 : medium load, 2 : full load
Please do not issue more calls, when this call returns 2.

Note about availability: server is available 24/7 except saturdays, when we schedule maintenance cycles; these service cylces are announced three days in advance on the web interface.

help
------------------
Example curl call for this document is:
curl -X GET http://clarin.phonetik.uni-muenchen.de/BASWebServices/services/help
----------------------------------------------------------------
----------------------------------------------------------------
runPipelineWithASR
------------------
Description: This is a service that combines two or more BAS webservices into a processing chain (pipeline) including Automatic Speech Recognition (ASR). Since not every BAS webservice can be combined with another, the service only offers pipelines that make sense for the user. Most pipelines executed by this service can also be executed by calling two or more BAS webservices after another and passing the output of one service to the next (exceptions are pipelines dealing with speaker diarization, SD). The benefit, however, is that the user data (which can be substantially large) will be up- and down-loaded only once, and of course that the user does not have to formulate several BAS webservice calls (with matching parameters). The parameter PIPE defines which processing pipeline will be executed; depending on the value of PIPE the service accepts parameters for the BAS webservices which are involved in the pipeline, and which make sense in the context of the pipeline. Other parameters will be set automatically depending on the value of PIPE (e.g. the MAUS parameter USETRN will be set to 'true' in the case of a pipeline where the runChunkPreparation service passes a BPF file to the runMAUS service containing a chunk segmentation in the TRN tier). Since this service basically comprise of all BAS web services, the number of possible parameters is necessarily huge. To make the selection easier we group the parameters into MANDATORY (that have to be set for every pipeline), optional parameters that are shared by more than one service, and then by PIPELINE ELEMENT (e.g. ASR, MAUS, in alphabetical order). In most cases it is sufficient to set the MANDATORY parameters, and the PipelineWithASR service will then set the element specific parameters automatically. The service will perform a pre-check on all set parameters to detect conflicts and then terminate with an informative message; but there are still many cases where the pipeline will start working and then terminate with an error caused by a service later down the pipe. Starting with version 6.0 the service will deliver a ZIP archive instead of the output of the last service in PIPE, if the option 'KEEP' ('Keep everything') is enabled; this ZIP will contain input(s), all intermediary results, end result and a protocol of the pipeline process.
            This service is experimental and can be terminated
            any time without warning. It is restricted for academic use only; therefore this service cannot be called as a RESTful
            service like other BAS services, and the Web API to this service is protected by AAI Shiboleth authentification.
            

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F com=yes -F INSKANTEXTGRID=true -F selectSpeaker= -F USETEXTENHANCE=true -F TARGETRATE=100000 -F TEXT=@<filename> -F NOISE=0 -F PIPE= -F aligner=hirschberg -F ACCESSCODE= -F NOISEPROFILE=0 -F neg=@<filename> -F speakMatch= -F speakNumber=0 -F ASIGNAL=brownNoise -F NORM=true -F mauschunking=false -F minSpeakNumber=0 -F INSORTTEXTGRID=true -F WEIGHT=default -F minanchorlength=3 -F TROSpeakerID=true -F LANGUAGE=deu-DE -F NHANS=none -F USEAUDIOENHANCE=true -F speakMatchASR= -F maxlength=0 -F KEEP=false -F LEFT_BRACKET=# -F nrm=no -F LOWF=0 -F WHITESPACE_REPLACEMENT=_ -F CHANNELSELECT= -F marker=punct -F USEREMAIL= -F boost=true -F except=@<filename> -F MINPAUSLEN=5 -F forcechunking=false -F NOINITIALFINALSILENCE=false -F InputTierName=unknown -F BRACKETS=<> -F OUTFORMAT=TextGrid -F syl=no -F ENDWORD=999999 -F TROSpeakerIDASR=false -F wsync=yes -F UTTERANCELEVEL=false -F featset=standard -F pos=@<filename> -F APHONE= -F INSPROB=0.0 -F OUTSYMBOL=x-sampa -F RULESET=@<filename> -F maxSpeakNumber=0 -F USEWORDASTURN=false -F allowOverlaps=false -F minchunkduration=15 -F SIGNAL=@<filename> -F stress=no -F imap=@<filename> -F MODUS=default -F RELAXMINDUR=false -F ATERMS=@<filename> -F numberSpeakDiar=0 -F RELAXMINDURTHREE=false -F STARTWORD=0 -F INSYMBOL=sampa -F PRESEG=false -F AWORD=ANONYMIZED -F USETRN=false -F ASRType=autoSelect -F MAUSSHIFT=default -F diarization=false -F HIGHF=0 -F silenceonly=0 -F boost_minanchorlength=4 -F ADDSEGPROB=false 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runPipelineWithASR'

Parameters:  [com] [INSKANTEXTGRID] [selectSpeaker] [USETEXTENHANCE] [TARGETRATE] [TEXT] [NOISE] [PIPE] [aligner] [ACCESSCODE] [NOISEPROFILE] [neg] [speakMatch] [speakNumber] [ASIGNAL] [NORM] [mauschunking] [minSpeakNumber] [INSORTTEXTGRID] [WEIGHT] [minanchorlength] [TROSpeakerID] [LANGUAGE] [NHANS] [USEAUDIOENHANCE] [speakMatchASR] [maxlength] [KEEP] [LEFT_BRACKET] [nrm] [LOWF] [WHITESPACE_REPLACEMENT] [CHANNELSELECT] [marker] [USEREMAIL] [boost] [except] [MINPAUSLEN] [forcechunking] [NOINITIALFINALSILENCE] [InputTierName] [BRACKETS] [OUTFORMAT] [syl] [ENDWORD] [TROSpeakerIDASR] [wsync] [UTTERANCELEVEL] [featset] [pos] [APHONE] [INSPROB] [OUTSYMBOL] [RULESET] [maxSpeakNumber] [USEWORDASTURN] [allowOverlaps] [minchunkduration] SIGNAL [stress] [imap] [MODUS] [RELAXMINDUR] [ATERMS] [numberSpeakDiar] [RELAXMINDURTHREE] [STARTWORD] [INSYMBOL] [PRESEG] [AWORD] [USETRN] [ASRType] [MAUSSHIFT] [diarization] [HIGHF] [silenceonly] [boost_minanchorlength] [ADDSEGPROB]

Parameter description: 
com: [yes, no] 
Option com (Keep Annotation): yes/no decision whether <*> strings in text inputs should be treated as annotation
                markers (yes) or as spoken words (no). If set to 'yes', then strings of this type are considered as annotation markers that are not
                processed as spoken words but passed on to the output. 
                The <*> markers will appear in the ORT
                and KAN tier with a word index on their own. WebMAUS makes use of two special markers < usb > (e.g.
                non-understandable word or other human noises) and < nib > (non-human noise).
                All other markers <*> are modelled as silence. Markers must be separated from word tokens
                by blanks; they do not need to be blank-separated from non-word tokens as punctuation. Note that the default service 'TEXTENHANCE'
                that is called by any pipeline that reads input text will replace white space characters (such as blanks) within the <*> by 
                the character given in option 'White space replacement'.

INSKANTEXTGRID: [true, false] 
OPTION INSKANTEXTGRID: Switch to create an additional tier in the TextGrid output file
                with a word segmentation labelled with the canonic phonemic transcript (taken from the input KAN
                tier).

selectSpeaker: Option selectSpeaker ('Speaker processed by pipeline'): 
                the rest of the pipeline 
                processes only the speech segments labelled with the speaker name given in this option. 
                Note that the name must match the standard speaker labels 'S1', 'S2' etc., or - if the 
                option 'speakMatch' ('Speaker label mapping') is used - it must match the assigned speaker name instead. 
                Example: 'speakMatch' is set to 'Ann,Tom' and you want just to process the speech of the second appearing speaker (Tom) 
                in the pipeline, then set 'selectSpeaker=Tom'; if 'speakMatch' is not set, set 'selectSpeaker=S2'. If the option is not set
                or set to the empty string, all speakers are processed by the pipe.
                

USETEXTENHANCE: [true, false] 
Switch on the input text pre-processing 'textEnhance' (true).
                If the PIPE starts with G2P, the input text is first normalized by 'textEnhance'.
                Different TXT formats are mapped to simple UTF-8 Unix style TXT format, and textmarkers 
                are normalized to be conform with BAS WebServices.
                

TARGETRATE: [100000, 20000, 10000] 
Option TARGETRATE: the resolution of segment boundaries in output measured in 100nsec units (default 100000 = 10msec). Decreasing this value (min is 10000) increases computation time, does not increase segmental accuracy in average, but allows output segment boundaries to assume more possible values (default segment boundaries are quantizised in 10msec steps). This is useful, if MAUS results are analysed for duration of phones or syllables.
                

TEXT: Optional parameter TEXT: The textual input to the pipeline, usually some form of text or transcript. Depending on parameter PIPE this can be a text document (all formats that service runTextEnhance supports), a comma separated spreadsheet (csv), a praat TextGrid (TextGrid), an ELAN EAF (eaf), or a BAS Partitur Format (par) file. See
                http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html for detailed description of the
                BPF. Note that PIPEs starting with service ASR or MINNI do not require this parameter.
                  Special languages for text input:
                        Thai, Russian and Georgian expect their respective standard alphabets;
                        Japanese allows Kanji or Katakana or a mixture of both, but the tokenized output will contain only the
                        Katakana version of the input;
                        Swiss German expects input to be transcribed in 'Dieth' (https://en.wikipedia.org/wiki/Swiss_German);
                        Australian Aboriginal languages (including Kunwinjku, Yolnu Matha) expect so called 'Practical Orthography'
                        (https://en.wikipedia.org/wiki/Transcription_of_Australian_Aboriginal_languages);
                        Persian accepts a romanized version of Farsi developped by Elisa Pellegrino and Hama Asadi (see
                        http://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/PersianRomanizationTable.pdf)
                        for details).
                

NOISE: [0.0, 100.0] 
Option NOISE: if set to a value between 1...100, a noise profile is calculated 
                from the leading and/or trailing parts of the input signal, and then the signal is 
                noise reduced with a strength proportional to the NOISE value 
                (using SoX spectral noise reduction effect 'noisered'). 
                The noise reduction is applied before
                any other processing/merging in all input channels.
                If NOISE=0, no noise reduction takes place.
                

PIPE: [ASR_G2P_CHUNKER, ASR_SUBTITLE, G2P_CHUNKER, MINNI_PHO2SYL, ASR_G2P_CHUNKER_MAUS, ASR_G2P_CHUNKER_MAUS_SD, ASR_G2P_CHUNKER_MAUS_PHO2SYL, ASR_G2P_CHUNKER_MAUS_PHO2SYL_SD, ASR_G2P_CHUNKER_MAUS_SUBTITLE, ASR_G2P_CHUNKER_MAUS_SUBTITLE_SD, ASR_G2P_CHUNKER_MAUS_SUBTITLE_PHO2SYL, ASR_G2P_CHUNKER_MAUS_SUBTITLE_PHO2SYL_SD, ASR_G2P_MAUS, ASR_G2P_MAUS_SD, ASR_G2P_MAUS_PHO2SYL, ASR_G2P_MAUS_PHO2SYL_SD, ASR_G2P_MAUS_SUBTITLE, ASR_G2P_MAUS_SUBTITLE_SD, ASR_G2P_MAUS_SUBTITLE_PHO2SYL, ASR_G2P_MAUS_SUBTITLE_PHO2SYL_SD, CHUNKER_MAUS, CHUNKER_MAUS_SD, CHUNKER_MAUS_PHO2SYL, CHUNKER_MAUS_PHO2SYL_SD, CHUNKER_MAUS_SUBTITLE, CHUNKER_MAUS_SUBTITLE_SD, CHUNKER_MAUS_SUBTITLE_PHO2SYL, CHUNKER_MAUS_SUBTITLE_PHO2SYL_SD, CHUNKPREP_G2P_MAUS, CHUNKPREP_G2P_MAUS_SD, CHUNKPREP_G2P_MAUS_PHO2SYL, CHUNKPREP_G2P_MAUS_PHO2SYL_SD, CHUNKPREP_G2P_MAUS_SUBTITLE, CHUNKPREP_G2P_MAUS_SUBTITLE_SD, CHUNKPREP_G2P_MAUS_SUBTITLE_PHO2SYL, CHUNKPREP_G2P_MAUS_SUBTITLE_PHO2SYL_SD, SD_ASR_G2P_MAUS, SD_ASR_G2P_MAUS_PHO2SYL, SD_ASR_G2P_MAUS_SUBTITLE, SD_ASR_G2P_MAUS_SUBTITLE_PHO2SYL, G2P_CHUNKER_MAUS, G2P_CHUNKER_MAUS_SD, G2P_CHUNKER_MAUS_PHO2SYL, G2P_CHUNKER_MAUS_PHO2SYL_SD, G2P_CHUNKER_MAUS_SUBTITLE, G2P_CHUNKER_MAUS_SUBTITLE_SD, G2P_CHUNKER_MAUS_SUBTITLE_PHO2SYL, G2P_CHUNKER_MAUS_SUBTITLE_PHO2SYL_SD, G2P_MAUS, G2P_MAUS_SD, G2P_MAUS_PHO2SYL, G2P_MAUS_PHO2SYL_SD, G2P_MAUS_SUBTITLE, G2P_MAUS_SUBTITLE_SD, G2P_MAUS_SUBTITLE_PHO2SYL, G2P_MAUS_SUBTITLE_PHO2SYL_SD, MAUS_PHO2SYL, MAUS_PHO2SYL_SD, MAUS_SUBTITLE, MAUS_SUBTITLE_SD, MAUS_SUBTITLE_PHO2SYL, MAUS_SUBTITLE_PHO2SYL_SD, ASR_G2P_CHUNKER_MAUS_ANONYMIZER, ASR_G2P_CHUNKER_MAUS_ANONYMIZER_SD, ASR_G2P_CHUNKER_MAUS_PHO2SYL_ANONYMIZER, ASR_G2P_CHUNKER_MAUS_PHO2SYL_ANONYMIZER_SD, ASR_G2P_CHUNKER_MAUS_ANONYMIZER_SUBTITLE, ASR_G2P_CHUNKER_MAUS_ANONYMIZER_SUBTITLE_SD, ASR_G2P_CHUNKER_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER, ASR_G2P_CHUNKER_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER_SD, ASR_G2P_MAUS_ANONYMIZER, ASR_G2P_MAUS_ANONYMIZER_SD, ASR_G2P_MAUS_PHO2SYL_ANONYMIZER, ASR_G2P_MAUS_PHO2SYL_ANONYMIZER_SD, ASR_G2P_MAUS_ANONYMIZER_SUBTITLE, ASR_G2P_MAUS_ANONYMIZER_SUBTITLE_SD, ASR_G2P_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER, ASR_G2P_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER_SD, CHUNKER_MAUS_ANONYMIZER, CHUNKER_MAUS_ANONYMIZER_SD, CHUNKER_MAUS_PHO2SYL_ANONYMIZER, CHUNKER_MAUS_PHO2SYL_ANONYMIZER_SD, CHUNKER_MAUS_ANONYMIZER_SUBTITLE, CHUNKER_MAUS_ANONYMIZER_SUBTITLE_SD, CHUNKER_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER, CHUNKER_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER_SD, CHUNKPREP_G2P_MAUS_ANONYMIZER, CHUNKPREP_G2P_MAUS_ANONYMIZER_SD, CHUNKPREP_G2P_MAUS_PHO2SYL_ANONYMIZER, CHUNKPREP_G2P_MAUS_PHO2SYL_ANONYMIZER_SD, CHUNKPREP_G2P_MAUS_ANONYMIZER_SUBTITLE, CHUNKPREP_G2P_MAUS_ANONYMIZER_SUBTITLE_SD, CHUNKPREP_G2P_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER, CHUNKPREP_G2P_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER_SD, SD_ASR_G2P_MAUS_ANONYMIZER, SD_ASR_G2P_MAUS_PHO2SYL_ANONYMIZER, SD_ASR_G2P_MAUS_ANONYMIZER_SUBTITLE, SD_ASR_G2P_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER, G2P_CHUNKER_MAUS_ANONYMIZER, G2P_CHUNKER_MAUS_ANONYMIZER_SD, G2P_CHUNKER_MAUS_PHO2SYL_ANONYMIZER, G2P_CHUNKER_MAUS_PHO2SYL_ANONYMIZER_SD, G2P_CHUNKER_MAUS_ANONYMIZER_SUBTITLE, G2P_CHUNKER_MAUS_ANONYMIZER_SUBTITLE_SD, G2P_CHUNKER_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER, G2P_CHUNKER_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER_SD, G2P_MAUS_ANONYMIZER, G2P_MAUS_ANONYMIZER_SD, G2P_MAUS_PHO2SYL_ANONYMIZER, G2P_MAUS_PHO2SYL_ANONYMIZER_SD, G2P_MAUS_ANONYMIZER_SUBTITLE, G2P_MAUS_ANONYMIZER_SUBTITLE_SD, G2P_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER, G2P_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER_SD, MAUS_ANONYMIZER, MAUS_ANONYMIZER_SD, MAUS_PHO2SYL_ANONYMIZER, MAUS_PHO2SYL_ANONYMIZER_SD, MAUS_ANONYMIZER_SUBTITLE, MAUS_ANONYMIZER_SUBTITLE_SD, MAUS_SUBTITLE_PHO2SYL_ANONYMIZER, MAUS_SUBTITLE_PHO2SYL_ANONYMIZER_SD] 
Parameter PIPE: The type of pipeline to process. Values of parameter PIPE have the general form SERVICE_SERVICE[_SERVICE ...], where SERVICE is one of ASR, G2P, MAUS, CHUNKER, CHUNKPREP, PHO2SYL, MINNI, SUBTITLE, ANONYMIZER, SD. For example PIPE=G2P_CHUNKER_MAUS_PHO2SYL_SD denotes a pipe that runs over these 5 services. The first SERVICE in the PIPE value detemines whether both, SIGNAL and TEXT, inputs are necessary or only a SIGNAL; the last SERVICE in PIPE determines which output the pipeline can produce. Therefore it is quite possible to call a pipe with impossible input/output configuration which will cause an ERROR. Every media file uploaded will first be passed through the service 'AudioEnhance' to normalized the media file to a RIFF WAVE format file; every text input is first run through the service 'TextEnhance' to normalize the text format; for both these obligatory services exist options as for the other pipeline SERVICES. Special pipelines: There are some pipes that do more than simply chaining the services and piping the output of a module as input into the next module:
  1. Pipes that end on "..._SD"
     The final speaker diarization module (SD) does not actual read any annotations from the previous services;
     it rather runs the speaker diarization in parallel on the signal input and then merges the speaker segmentation
     and laelling with whatever the rest of the pipe has produced, e.g. it merges speaker segments and word segments
     to produce a (symbolic) speaker labelling of the word segments.
  2. Pipes that start with "SD_ASR_..."
     First a speaker diarization is run on the input signal; then only the speaker segments (optionally filtered by 
     option 'selectSpeaker') are passed to the ASR module; all results (one per speaker segment) of ASR are 
     summarized into a single BPF file with tiers ORT,TRO (from ASR) and TRN,SPD (from SD) and then passed on through the 
     rest pipe, which treats this exactly like a chunk segmentation as produced by module CHUNKPREP.
                

aligner: [hirschberg, fast] 
Symbolic aligner to be used. The "fast" aligner performs approximate alignment by splitting the alignment matrix into "windows" of size 5000*5000. The "hirschberg" aligner performs optimal matching. On recordings below the 1 hour mark, the choice of aligner does not make a big difference in runtime. On longer recordings, you can improve runtime by selecting the "fast" aligner. Note however that this choice increases the probability of errors on recordings with untranscribed stretches (such as long pauses, musical interludes, untranscribed speech). Therefore, the "hirschberg" aligner should be used on this kind of material.

ACCESSCODE: 
                  Exceed quota code (ACCESSCODE): special code a user has acquired to override default quotas. 
                  Not needed for normal operation. 
                

NOISEPROFILE: [-1000000.0, 1000000.0] 
Option NOISEPROFILE: if set to 0 (default), the noise profile is calculated from the leading and trailing
                portion of the recording (estimated by a silence detector); if set to a positive value, the noise profile is calculated 
                from the leading NOISEPROFILE samples; if set to a negative value, the noise profile is calculated
                from the trailing NOISEPROFILE samples. This is useful, if the recording contains loud noise at the begin/end of the recording
                that would not be selected by the silence detector (because of too much energy).
                

neg: 
                Option neg : N-HANS sample recording (RIFF WAVE *.wav) of the noise to be removed from signal (mode 'denoiser') or 
                the speaker/speaker group to be removed from signal (mode 'separator').
                The 'neg' sample is applied to all processed input signals; do not upload more than 2sec of clean signal,
                and make sure that the relevant signal is present within the very first second; 'clean signal' means that the sample should
                not contain any traces of the main voice or of the 'pos' noise sample. 
                The upload of the 'neg' sample is mandatory for both N-HANS modi (see option 'NHANS').
                

speakMatch: Option speakMatch (': if set to a list of comma separated names (e.g.
		speakMatch='Anton,Berta,Charlie', the corresponding speaker labels found by the 
		speaker diarization in the order of appearance are replaced by these names (e.g. 'S1' to 'Anton',
		'S2' to 'Berta' etc.). This allows the user to create SD annotation using her self
		defined speaker labels, if the user knows the order of appearance; it is obvious
		that this feature only makes sense in single file processing, since the speaker labels
		and the order of appearance differ from one recording to the next; the suggested mode
		of operation is to run the service in batch mode over all recordings with speakMatch="",
		then inspect manually the resulting annotation and define speaker labels in the order of 
		appearance for each recording, and then run the service in single file mode for each 
		recording again with the corresponding speakMatch list. If the speakMatch option contains 
		a comma separated list of value pairs like 'S1:Anton', only the speaker labels listed 
		on the lefthand side of each pair are patched, e.g. for speakMatch='S3:Charlie,S6:Florian'
		only the third and sixth appearing speaker are renamed to Charlie and Florian respectively.
                

speakNumber: [0.0, 999999.0] 
Option speakNumber restricts the number of detected speakers by the speaker diarization to the 
		given number. If set to 0 (default), the SD method determines the number automatically.
	        

ASIGNAL: [brownNoise, beep, silence] 
Option ASIGNAL: the type of signal to mask anonymized terms in the signal.
                'brownNoise' is brown noise; 'beep' is a 500Hz sinus; 'silence' is total silence (zero signal); masking signals have an amplitude of -10dB of the maximum amplitude
                and are faded in and out with a very short sinoid function.
                

NORM: [true, false] 
Option NORM: if true (selected) each input channel is amplitude normalised to -3dB before any merge.
                

mauschunking: [true, false] 
If this parameter is set to true, the recognition module will model words as MAUS graphs as opposed to canonical chains of phonemes. This will slow down the recognition engine, but it may help with non-canonical speech (e.g., accents or dialects).

minSpeakNumber: [0.0, 999999.0] 
Option minSpeakNumber defines a hard lower bound of the number of detected speakers. If set to 0 (default), no lower bound.
	        

INSORTTEXTGRID: [true, false] 
Option INSORTTEXTGRID: Switch to create an additional tier ORT in the TextGrid output file
                with a word segmentation labelled with the orthographic transcript (taken from the input ORT tier);
                this option is only effective, if the input BPF contains an additional ORT tier.

WEIGHT: 
                MAUS pipeline:
                The option WEIGHT weights the influence of the statistical pronunciation model against the
                acoustical scores. More precisely WEIGHT is multiplied to the pronunciation model score (log
                likelihood) before adding the score to the acoustical score within the search. Since the pronunciation
                model in most cases favors the canonical pronunciation, increasing WEIGHT will at some point cause MAUS
                to choose always the canonical pronunciation; lower values of WEIGHT will favor less probable paths be
                selected according to acoustic evidence. If the acoustic quality of the signal is very good and the
                HMMs of the language are well trained, it makes sense to lower WEIGHT. For most languages this option
                is default to 1.0. In an evaluation on parts of the German Verbmobil data set (27425 segments) which
                were segmented and labelled manually (MAUS DEV set) WEIGHT was optimized to 7.0. Note that this might
                NOT be the optimal value for other languages. For instance Italian shows best results with WEIGHT=1.0,
                Estonian with WEIGHT=2.5. If set to default, a language specific optimal value is chosen automatically.
                MINNI pipeline:     
                The option WEIGHT weights the influence of the statistical phonotactic bigram model (the a-priori probability of pronuciation)
                against the acoustical scores. More precisely WEIGHT is multiplied to the phonotactic model score (log
                likelihood) before adding the score to the acoustical score within the Viterbi search. Since MINNI uses a phonotactic
                bigram model, increasing WEIGHT will at some point cause MINNI to choose always the same most likely sequence of phones
                according to the bigram model (disregarding the acoustics) with equally long segments, i.e. no meaningful segmentation at all;
                lower values of WEIGHT will cause phoneme sequences to be detected according to acoustic evidence, even if the resulting pronunciation is less likely
                according to the phonotactic bigram model;
                if WEIGHT is set to 0.0 the bigram is completely ignored and MINNI performs a
                phone recognition bases only on acoustic likelihood (and any sequence of phones is a-priori equally probable). If the
                acoustic quality of the signal is very good and the HMMs of the language are well trained, it makes
                sense to lower WEIGHT to achieve more precise results given the acoustic. For most languages this option is default to 1.0
                (which means that acoustic evidence and a-priori pronunciation probability are treated equally).
                

minanchorlength: [2.0, 8.0] 
The chunker performs speech recognition and symbolic alignment to find regions of correctly aligned words (so-called 'anchors'). Setting this parameter to a high value (e.g. 4-5) means that the chunker finds chunk boundaries with higher certainty. However, the total number of discovered chunk boundaries may be reduced as a consequence. A low value (e.g. 2) is likely to lead to a more fine-grained chunking result, but with lower confidence for individual chunk boundaries.

TROSpeakerID: [true, false] 
If set to true (default: false), in pipes 'SD_ASR_...' speaker ID labels of the form '<XXX:> ' will be inserted before words in the TRO tier, that start a new speaker turn of speaker labelled by 'XXX'. The inserted speaker label 'XXX' is either one of the standardized labels 'S1', 'S2', ... or mapped speaker labels taken from the option 'speakMatch'. The service also checks each preceeding word to a speaker turn change (the last word of the previous turn) and adds a trailing '.', if the word does not has already a trailing final punctuation sign (one of '!?.:...). This option enables pipelines that start with 'ASR' and end with 'SUBTITLE' to create subtitle tracks (e.g. WebVTT) that show the speaker ID and start a new subtitle at ech speaker turn change.
                

LANGUAGE: [cat, deu, eng, fin, hat, hun, ita, mlt, nld, nze, pol, aus-AU, afr-ZA, sqi-AL, arb, eus-ES, eus-FR, cat-ES, nld-NL-GN, nld-NL, nld-NL-OH, nld-NL-PR, eng-US, eng-AU, eng-GB, eng-GB-OH, eng-GB-OHFAST, eng-GB-LE, eng-SC, eng-NZ, eng-CA, eng-GH, eng-IN, eng-IE, eng-KE, eng-NG, eng-PH, eng-ZA, eng-TZ, ekk-EE, kat-GE, fin-FI, fra-FR, deu-AT, deu-CH, deu-DE, deu-DE-OH, gsw-CH-BE, gsw-CH-BS, gsw-CH-GR, gsw-CH-SG, gsw-CH-ZH, gsw-CH, hat-HT, hun-HU, isl-IS, ita-IT, jpn-JP, gup-AU, sampa, ltz-LU, mlt-MT, nor-NO, fas-IR, pol-PL, ron-RO, rus-RU, slk-SK, spa-ES, spa-AR, spa-BO, spa-CL, spa-CO, spa-CR, spa-DO, spa-EC, spa-SV, spa-GT, spa-HN, spa-MX, spa-NI, spa-PA, spa-PY, spa-PE, spa-PR, spa-US, spa-UY, spa-VE, swe-SE, tha-TH, guf-AU] 

                Language: RCFC5646 locale code of the processed speech; defines the phoneme set of input and the orthographic system of input text (if any);
                we use the RFC5646 sub-structure 'iso639-3 - iso3166-1 [ - iso3166-2], e.g. 'eng-US' for American English, 'deu-AT-1' for Austrian German spoken in
                'Oberoesterreich'; the code 'sampa' ('Language independent') allows the user to upload a customized
                mapping from orthographic to phonologic form (see option 'imap'). Special languages: 'gsw-CH' denotes
                text written in Swiss German 'Dieth' transcription (https://en.wikipedia.org/wiki/Swiss_German); 'gsw-CH-*' are localized varieties in larger Swiss cities;
                'jpn-JA' (Japanese) accepts Kanji or Katakana or a mixture of both, but the tokenized output will contain only the Katakana version of the input;
                'aus-AU' (Australian Aboriginal languages, including Kunwinjku, Yolnu Matha) accept so called 'Modern Practical Orthography'
                (https://en.wikipedia.org/wiki/Transcription_of_Australian_Aboriginal_languages);
                'fas-IR' (Persian) accepts a romanized version of Farsi developped by Elisa Pellegrino and Hama Asadi (see
                http://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/PersianRomanizationTable.pdf for details);
                'arb' is a macro language covering all Arabic varieties; the input must be encoded in a broad phonetic romanization developped
                by Jalal Tamimi and colleagues (see http://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/TamimiRomanization.pdf for details).
                The language code is passed to all services of the pipeline, thus influencing the way these services will
                process the speech. If one member of the PIPE does not support the language, the service will try to determine another 
                suitable language (WARNING is issued) or, if that is not possible, an ERROR is returned. Note
                that some services will support more languages than offered in the pipeline service, but we restrict the
                pipeline languages to a reasonable core set that is supported by most services.
                

NHANS: [none, denoiser, separator] 
Option NHANS: the N-HANS audio enhancement mode (default: 'none') applied to the 
                result of the SoX pipeline. 
                'denoiser' : the noise as represented in the sample recording  
                uploaded in the mandatory option file 'neg' is removed from the signal; if another voice or noise sample is uploaded in 
                option file 'pos' (optional), this noise/voice is being preserved in the signal together with the main voice. 
                'separator' : an interference speaker or speaker group as represented in the sample recording 
                uploaded in the mandatory option file 'neg' is removed from the signal while the voice of a target speaker as uploaded in 
                the mandatory option file 'pos' is being preserved in the signal.
                Both sample signals, 'neg' and 'pos', are applied to all processed input signals; do not upload more than 2sec of clean signal,
                and make sure that the relevant signal is present within the very first second; 'clean signal' means that the sample should
                not contain any traces of the main voice or of the other noise sample. 
                

USEAUDIOENHANCE: [true, false] 
Switch on the signal normalization 'AudioEnhance' (true).
                

speakMatchASR: Option speakMatchASR: if set to a list of comma separated names (e.g.
		speakMatch='Anton,Berta,Charlie', the corresponding speaker labels found by the 
		speaker diarization in the order of appearance are replaced by these names (e.g. 'S1' to 'Anton',
		'S2' to 'Berta' etc.). This allows the user to create SD annotation using her self
		defined speaker labels, if the user knows the order of appearance; it is obvious
		that this feature only makes sense in single file processing, since the speaker labels
		and the order of appearance differ from one recording to the next; the suggested mode
		of operation is to run the service in batch mode over all recordings with speakMatch="",
		then inspect manually the resulting annotation and define speaker labels in the order of 
		appearance for each recording, and then run the service in single file mode for each 
		recording again with the corresponding speakMatch list. If the speakMatch option contains 
		a comma separated list of value pairs like 'S1:Anton', only the speaker labels listed 
		on the lefthand side of each pair are patched, e.g. for speakMatch='S3:Charlie,S6:Florian'
		only the third and sixth appearing speaker are renamed to Charlie and Florian respectively.
                

maxlength: [0.0, 999.0] 
Maximum subtitle length. If set to 0,
                        subtitles of indefinite length are created, based only
                        on the distance of the split markers. If set to a
                        value greater than 0, subtitles are split whenever a
                        stretch between two neighbouring split markers is
                        longer than that value (in words). Caution: This may
                        lead to subtitle splits in suboptimal locations (e.g.
                        inside syntactic phrases).

KEEP: [true, false] 
Keep everything (KEEP): If set to true (default: false), the service will return a ZIP archive instead of the output of the last service in PIPE. The ZIP is named as the output file name (as defined in OUT) with extension zip and contains the following files: input(s) including optional files (e.g. RULESET), all intermediary results of the PIPE, the result of the pipeline, and a protocol listing all options; all stored files in the ZIP start with the file name body of the SIGNAL input followed by the marker '_LABEL', which indicates from which part of the pipe the file is produced, and the appropriate file type extension; 'LABEL' is one of INPUT, AUDIOENHANCE (which marks the pre-processed media file), ASR, CHUNKER, CHUNKPREP, G2P, MAUS, PHO2SYL, ANONYMIZER, SUBTITLE and README (which marks the protocol file). The protocol file contains a simple list of 'option = value' pairs. The result file(s) of the pipeline have no '_LABEL' marker. The KEEP option is useful for documenting scientific pipeline runs, and for retrieving results that are produced by the PIPE but are overwritten/not passed on by later services (e.g. an anonymized video or CHUNKER output).

LEFT_BRACKET: One or more characters which mark comments reaching
                        until the end of the line (default: #). E.g. if your 
                        input text contains comment lines that begin with ';', 
                        set this option to ';' to avoid that these comments are
                        treated as spoken text. If you want to suppress the default
                        '#' comment character, set this option to 'NONE'.
                        If you are using comment lines in your input text, you must be absolutely 
                        sure that the comment character appears nowhere in the text except in comment lines!
                        Note 1: the characters '&', '|' and '=' do not work as comment characters.
                        Note 2: for technical reasons the value for this option cannot be empty.
                        Note 3: the default character '#' cannot be combined with other characters, e.g. if
                        you define this option as ';#', the '#' will be ignored.
                        Note 4 (sorry): for the service 'Subtitle' comment lines must be terminated with
                        a so called 'final punctuation sign', i.e. one of '.!?:…'; otherwise, an immediately
                        following speaker marker will not be recognized.
                

nrm: [yes, no] 
Text normalization. Currently available for German and English only.
                Detects and expands 22 non-standard word types. All output file types supported but
                not available for the following tokenized input types: bpf, TextGrid, and tcf. If
                switched off, only number expansion is carried out.

LOWF: [0.0, 30000.0] 
Option LOWF: lower filter edge in Hz. If set >0Hz and HIGHF is 0Hz, a high pass filter 
                with LOWF Hz is applied; if set >0Hz and HIGHF is set higher than LOWF, a band
                pass between LOWF and HIGHF is applied; if set >0Hz and HIGHF is set higher than 0Hz but lower 
                than LOWF, a reject band pass between HIGHF and LOWF is applied. 
                E.g. HIGHF = 3000 LOWF = 300 is telephone band; HIGHF = 45 LOWF = 55 filters out a 50Hz hum.
                

WHITESPACE_REPLACEMENT: The character that whitespace in comments should be
                        substituted by (default: '_'). The BAS WebServices require 
                        that annotation markers or comment lines in input texts do not 
                        contain white spaces. This option let you decide which character
                        should be used to replace the white spaces. 
                        If set to the string 'NONE' no replacement takes place.
                        CAUTION: the characters '&' and '=' do not work as replacements.
                

CHANNELSELECT: Option CHANNELSELECT: list of comma-separated channel numbers that are selected
                for further processing from the input media file.
                Examples: 
                MONO=true,CHANNELSELECT="" : merge multi-channel files into one channel, 
                MONO=true,CHANNELSELECT="2,3,4" : merge only selected channels into one channel,
                MONO=false, CHANNELSELECT="3,4,1,2" : select and re-arrange channels,
                MONO=false, CHANNELSELECT="" : do nothing.
                Note that channels are numbered starting with 1 = left channel in stereo, 2 = right channel, ...
                By reversing the order of channel numbers in CHANNELSELECT you can swap channels, e.g.
                CHANNELSELECT="2,1" MONO=false will swap left and right channel of a stereo signal.
                

marker: [punct, newline, tag] 
Marker used to split transcription into subtitles. If
                        set to 'punct' (default), the transcription is split
                        after 'terminal' punctuation marks (currently [.!?:…]. If set
                        to 'newline', the transcription is split at newlines (\n
                        or \r\n). If set to 'tag', the program expects a special
                        < BREAK > tag inside the transcription (without the
                        blanks between the brackets and BREAK!).

USEREMAIL: Option USEREMAIL: if a valid email address is provided through this option, the 
                service will send the XML file containing the results of the service run to this address after completion.
                It is recommended to set this option for long recordings (batch size <6, length >1h) since it is often problematic to wait 
                for service completion over an instable internet connection or from a laptop that might go into hibernation.
                The email address provided is not stored on the server. It is sometimes even advisable to kill the browser tab
                after starting the call and wait for the result emails (only for batch size <6!).
                Beware: the download link to your result(s) will be valid for 24h after you receive the email; after that
                all your data will be purged from the server.
                Disclaimer: the usage of this option is at your own risk; the key URL to download your result file will
                be send without encryption in this email; be aware that anybody who can intercept this email will be able 
                to access your result files using this key; the BAS at LMU Munich will not be held responsible for any
                security breach caused by using this email notification option.
                

boost: [true, false] 
If set to true (the default), the chunker will start by running a so-called boost phase over the recording. This boost phase uses a phoneme-based decoder instead of speech recognition. Usually, the boost option reduces processing time. On noisy input or faulty transcriptions, the boost option can lead to an increase in errors. In this case (or if a previous run with boost set to 'true' has led to chunking errors), set this option to 'false'.

except: Exception dictionary file overwriting the standard G2P output. Format: 2 semicolon-separated columns: word;transcript. Phonemes in transcript must be blank-separated. Example: sagt;z ' a x t. Note that the transcript must not contain phonemic symbols that are unknown to other
services in the pipeline for the selected language; the service 'WebMAUS General' provides a list of all known symbols of a language

MINPAUSLEN: [0.0, 999.0] 
Option MINPAUSLEN: Controls the behaviour of optional inter-word silence. If set to 1,
                maus will detect all inter-word silence intervals that can be found (minimum length for a silence
                interval is then 10 msec = 1 frame). If set to values n>1, the minimum length for an inter-word
                silence interval to be detected is set to n*10 msec. For example MINPAUSLEN of 5 will cause MAUS to
                suppress inter-word silence intervals up to a length of 40msec. Since 40 msec seems to be the border of
                perceivable silence, we set this option default to 5. With other words: inter-word silences smaller
                than 50msec are not segmented but rather distributed equally to the adjacent segments. If one of the
                adjacent segments happens to be a plosive then the deleted silence interval is added totally to the
                plosive; if both adjacent segments are plosives, the interval is equally spread as with non-plosive
                adjacent segments.

forcechunking: [true, false, rescue] 
If this parameter is set to true, the chunker will run in the experimental 'forced chunking' mode (chunker option 'force'). While forced chunking is much more likely to return a fine-grained chunk segmentation, it is also more prone to chunking errors. As a compromise, you can also set this parameter to 'rescue'. In this case, the forced chunking algorithm is only invoked when the original algorithm has returned chunks that are too long for MAUS.

NOINITIALFINALSILENCE: [true, false] 
Option NOINITIALFINALSILENCE: 
                Switch to suppress the automatic modeling of an optional leading/trailing silence interval. This is
                useful, if for instance the signal is known to start with a stop and no leading silence, and the silence model would 
                'capture' the silence interval from the plosive.

InputTierName: Option InputTierName: Only needed, if TEXT is in TextGrid/ELAN format. Name of the annotation tier, that contains
                the input words/chunks.

BRACKETS: One or more pairs of characters which bracket
                        annotation markers in the input. E.g. if your input 
                        text contains markers '{Lachen}' and '[noise]' that should be passed as
                        markers and not as spoken text, set this option to '{}[]'.
                        Note that blanks replacement within such markers (see option 'WHITESPACE_REPLACEMENT')
                        only takes place in markern/comments that are defined here.
                

OUTFORMAT: [bpf, exb, csv, TextGrid, emuDB, eaf, tei, srt, sub, vtt, par] 
Option OUTFORMAT: the output format of the pipe. Note that this depends on the selected PIPE,
                more precisely, whether the last service in the pipeline supports the format; if not, an ERROR is returned. 
                Possible (selectable) formats are: 
                'TextGrid' - a praat compatible TextGrid file; 
                'par|bpf' - a BPF file (if the input (TEXT) is also a BPF file, the input is usually 
                copied to the output with new (or replaced) tiers); 
                'csv' - a spread sheet (CSV table) containing the most prominent tiers of the annotation; 
                'emuDB' - an Emu compatible *_annot.json file; 
                'eaf' - an ELAN compatible annotation file; 
                'exb' - an EXMARaLDA compatible annotation file; 
                'tei' - an Iso TEI document; 
                'srt' - a SubRip subtitle format file; 
                'sub' - a SubViewer subtitle format file;
                'vtt' - a 'WebVTT' subtitle format file.
                If output format is 'vtt' and a subtitle starts with a speaker marker of the form
                '<...>', a 'v ' is inserted before the '...'.
                For a description of BPF see
                http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html.
                For a description of Emu see https://github.com/IPS-LMU/emuR.
                Note 1: using 'emuDB' will first produce only single annotation file *_annot.json;
                in the WebMAUS interface (https://clarin.phonetik.uni-muenchen.de/BASWebServices) you can process
                more than one file and than download a zipped Emu database; in this case don't forget to change the default
                name of the emuDB 'MAUSOUTPUT' using the R function emuR::rename_emuDB().
                Note 2: if you need the same result in more than one format, select 'bpf' to produce a BPF file, and then
                convert this file with the service runAnnotConv ('AnnotConv') into the desired formats.
                Note 3: some format conversions are not loss-less; select 'bpf' to be sure that no information is lost.
                

syl: [yes, no] 
Switches syllabification of the pronunciation in the KAN tier produced
                by module G2P on; the syllable boundary marker is '.'. This option only makes sense in
                languages in which the module G2P produces a different syllabification than the module
                PHO2SYL (e.g. tha-TH). Otherwise use a pipe that ends with the module PHO2SYL which
                will create tiers MAS (phonetic syllable) and KAS (phonologic syllable).
                WARNING: syl=yes causes G2P to switch off MAUS embedded mode; this might change the
                output for some languages because the output phoneme inventar is then SAMPA and not the
                SAMPA variant used by MAUS. Subsequent modules like MAUS might report an ERROR then.
                

ENDWORD: [0.0, 999999.0] 
Option ENDWORD: If set to a value n<999999, this option causes maus to end the
                segmentation with the word number n (word numbering in BPF starts with 0). This is useful if the input
                signal file is just a segment within a longer transcript. See also option STARTWORD.

TROSpeakerIDASR: [true, false] 
If set to true (default: false), and if the selected ASR service delivers a valid speaker diarization (tier SPK) and a TRO tier, the service will insert speaker ID labels of the form '<XXX:> ' before each word in the TRO tier, that starts a new speaker turn of speaker labelled by 'XXX'. The inserted speaker label 'XXX' is either one of the standardized labels 'S1', 'S2', ... or mapped speaker labels taken from the option 'speakMatch'. The service also checks each preceeding word to a speaker turn change (the last word of the previous turn) and adds a trailing '.', if the word does not has already a trailing final punctuation sign (one of '!?.:...). This option enables pipelines that start with 'ASR' and end with 'SUBTITLE' to create subtitle tracks (e.g. WebVTT) that show the speaker ID and start a new subtitle at ech speaker turn change.
                

wsync: [yes, no] 
Yes/no decision, whether each word boundary is considered as syllable boundary. Only
                relevant for phonetic transcription input from MAU, PHO, or SAP tiers (for input from the KAN tier
                this option is always set to 'yes'). If set to 'yes',
                each syllable is assigned to exactly one word index. If set to 'no', syllables can be part of more than
                one word.

UTTERANCELEVEL: [true, false] 
Switch on utterance level modelling (true); only for PIPEs with text input.
                Every TEXT input line is modelled as an utterance in an additional annotation layer ('TRL') between recording (bundle) and
                words (ORT). This is usefull if the recording contains several sentences/utterances and you need hierarchical access
                to these in the resulting annotation structure. For example, in EMU-SDMS output the default hierarchy bundle->ORT->MAU
                is then changed to  bundle->TRL->ORT->MAU. Note 1 : does not have any effect in CSV output. Note 2 : the use of this option
                causes the ORT tier to contain the raw word tokens instead of the (default) word-normalized word tokens (e.g. '5,' (raw token) 
                vs. 'five' (word-normalized).
                

featset: [standard, extended] 
Feature set used for grapheme-phoneme conversion. The standard set is the
                default and comprises a letter window centered on the grapheme to be converted. The extended set
                additionally includes part of speech and morphological analyses. The extended set is currently
                available for German and British English only. For connected text the extended feature set generally
                generally yields better performance. However, if the input comprises a high amount of proper names
                provoking erroneous part of speech tagging and morphologic analyses, than the standard feature set is
                more robust.

pos: 
                Option pos : N-HANS sample recording (RIFF WAVE *.wav) of the noise to be preserved in the signal (mode 'denoiser') or 
                the target speaker to be preserved in the signal (mode 'separator').
                The 'pos' sample is applied to all processed input signals; do not upload more than 2sec of clean signal,
                and make sure that the relevant signal is present within the very first second; 'clean signal' means that the sample should
                not contain any traces of the main voice (mode 'denoiser') nor of the 'pos' noise sample (modes 'denoiser' and 'separator'). 
                The upload of the 'pos' sample is mandatory for N-HANS mode 'separator' and optional for mode 'denoiser' (see option 'NHANS').
                

APHONE: Option APHONE: the string used to mask phonetic/phonologic labels for anonymized terms.
                If not set, the service will use the label 'nib' for masking encodings in SAMPA, and the label
                '(.)' for encodings in IPA. If set to another label, this label is used to mask in all encodings.
                

INSPROB: Option INSPROB: The option INSPROB influences the probability of deletion of segments. It
                is a constant factor (a constant value added to the log likelihood score) after each segment.
                Therefore, a higher value of INSPROB will cause the probability of segmentations with more segments go
                up, thus decreasing the probability of deletions (and increasing the probability of insertions, which
                are rarely modelled in the rule sets). This parameter has been evaluated on parts of the German
                Verbmobil data set (27425 segments) which were segmented and labelled manually (MAUS DEV set) and found
                to have its optimum at 0.0 (which is nice). Therefore we set the default value of INSPROB to 0.0.
                INSPROB was also tested against the MAUS TEST set to confirm the value of 0.0. It had an optimum at 0.0
                as well. Note that this might NOT be the optimal value for other MAUS tasks.

OUTSYMBOL: [x-sampa, ipa, manner, place] 
Option Output Encoding (OUTSYMBOL): Defines the encoding of phonetic symbols in output. If set to 'x-sampa'
                (default), phonetic symbols in output are encoded in X-SAMPA (with some minor differences in languages
                Norwegian/Icelandic in which the retroflex consonants are encoded as 'rX' instead of X-SAMPA 'X_r');
                use service runMAUSGetInventar with option LANGUAGE=sampa to get a list of symbols and
                their mapping to IPA. If set to 'ipa', the service produces UTF-8 IPA output in annotation tiers
                MAU (MAUS last module in PIPE) or in KAS/MAS (PHO2SYL last module in PIPE). 
                Just for pipes with MAUS as the last module: if set to 'manner', the
                service produces Manner of articulation for each segment; possible values are: silence, vowel,
                diphthong, plosive, nasal, fricative, affricate, approximant, lateral-approximant, ejective; if set to
                'place', the service produces Place of articulation for each segment; possible values are: silence,
                labial, dental, alveolar, post-alveolar, palatal, velar, uvular, glottal, front, central,
                back.

RULESET: MAUS rule set file; UTF-8 encoded; one rule per line; there are two different file types defined by the extension: 1. Phonological rule set without statistical information '*.nrul', synopsis is: 'leftContext-match-rightContext>leftContext-replacement-rightContext', e.g. 't,s-e:-n>t,s-@-n'. 2. Rule set with statistical information '*.rul', synopsis is: 'leftContext,match,rightContext>leftContext,replacement,rightContext ln(P(replacement|match) 0.0000', e.g. 'P9,n,@,n,#>P9,# -3.761200 0.000000'; 'P(replacement|match)' is the conditional probability that 'match' is being replaced by 'replacement'; the sum over all conditional probabilities with the same condition 'match' must be less than 1; the difference between the sum and 1 is the conditional probability 'P(match|match)', i.e. no for no change. 'leftContext/rightContext/match/replacememt' = comma separated lists of SAMPA symbols or empty lists (for *.rul the leftContext/rightContext must be exactly one symbol!); special SAMPA symbols in contexts are: '#' = word boundary between words, and '<' = utterance begin (may be used instead of a phonemic symbol); digits in SAMPA symbols must be preceded by 'P' (e.g. '2:' -> 'P2:'); all used SAMPA symbols must be defined in the language specific SAMPA set (see service runMAUSGetInventar). Examples for '*.rul' : 'P9,n,@,n,#>P9,# = 'the word final syllable /n@n/ is deleted, if preceded by /9/', '#,k,u:>#,g,u:' = 'word intial /k/ is replaced by /g/ if followed by the vowel /u:/'. Examples for '*.nrul' : '-->-N,k-' = 'insert /Nk/ at arbitrary positions', '#-?,E,s-#>#-s-#' = 'delete /?E/ in word /?Es/', 'aI-C-s,t,#>aI-k-s,t,#' = 'replace /C/ in word final syllable /aICst/ by /k/'.

maxSpeakNumber: [0.0, 999999.0] 
Option maxSpeakNumber defines a hard upper bound of the number of detected speakers. If set to 0 (default), no upper bound.
	        

USEWORDASTURN: [true, false] 
If set to true (default: false), and if the selected ASR service delivers a valid word segmentation (tier WOR), this word segmentation is encoded  as a chunk segmentation in the output (tier TRN) instead of the (possible) result of a speaker diarization (default). Both, the speaker diarization (which is basically a turn segmentation) and the word segmentation, when used as a chunk segmentation input to MAUS, might improve the phonetic alignment of MAUS, since they act as fix time anchors for the MAUS segmentation process. In some cases the word segmentation as time anchors yields better results (simply because there are more of them and a gross misalignment of MAUS is less likely); sometimes the chosen ASR service does not deliver a speaker diarization, then this option allows to switch to the word segmentation (which is delivered by all ASR services). 
                

allowOverlaps: [true, false] 
Option allowOverlaps: If set to true, the un-altered output of PyAnnote is
                returned in the SPD tier (note that overlaps cannot be handled by most annotation formats; only use if you really need
                to detect overlaps!); if set to false (default), overlaps, missing silence intervals etc. are resolved
                in the output tier SPD, making this output compatible with all annotation formats. The postprocessing works as follows:
                1. all silence intervals are removed. 2. all speaker segments that are 100% within another (larger) speaker segment are 
                removed. 3. If an overlap occurs the earlier segment(s) are truncated to the start of the new segment. 4. all remaining 
                gaps in the segmentation are filled with silence intervals.
                

minchunkduration: [0.0, 999999.0] 
Lower bound for output chunk duration in seconds. Note that the chunker does not guarantee an upper bound on chunk duration.

SIGNAL: Mandatory parameter SIGNAL: mono sound file or video file 
                containing the speech signal to be processed; PCM 16 bit resolution; any
                sampling rate. Although the mimetype of this input file is restricted to RIFF AUDIO 
                audio/x-wav (extension wav), most pipes will also process NIST/SPHERE (nis|sph) and 
                video (mp4|mpeg|mpg|avi|flv).

stress: [yes, no] 
yes/no decision whether or not word stress is to be added to the canonical
                transcription (KAN tier). Stress is marked by a single apostroph (') that is inserted before the syllable nucleus
                into the transcription.

imap: Customized mapping table from orthography to phonology. If pointing to a valid mapping table, the pipeline service will automatically set the LANGUAGE option for service G2P to 'und' (undefined) while leaving the commandline option LANGUAGE for the remaining services unchanged (most likely 'sampa'). This mapping table is used then to translate the input text into phonological symbols. See https://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/readme_g2p_mappingTable.txt for details about the format of the mapping table.

MODUS: [default, standard, align] 
Option MODUS: Operation modus of MAUS: default is to use the language dependent default modus; the 
                two possible modi are: 'standard' which is the segmentation and
                labelling using the MAUS technique as described in Schiel ICPhS 1999, and 'align', a forced
                alignment is performed on the input SAM-PA string defined in the KAN tier of the BPF (the same effect
                as the deprecated former option CANONLY=true).

RELAXMINDUR: [true, false] 
Option Relax Min Duration (RELAXMINDUR) changes the default minimum duration of 30msec for consonants 
                and short/lax vowels and of 40msec for tense/long vowels and diphthongs to 10 and 20msec respectively. 
                This is not optimal for general segmentation because MAUS will start to insert many very short 
                vowels/glottal stops where they are not appropriate. But for some special investigations 
                (e.g. the duration of /t/) it alleviates the ceiling problem at 30msec duration.
                

ATERMS: Option ATERMS: file encoded in UTF-8 containing the terms that are to 
                be anonymized by the service. One term per line; terms may contain blanks, in which case only 
                consecutive occurances of the words within the term are anonymized.
                

numberSpeakDiar: [0.0, 999999.0] 
Option numberSpeakDiar restricts the number of detected speakers by the speaker diarization to the 
		given number. If set to 0 (default), the ASR service determines the number automatically.
	        

RELAXMINDURTHREE: [true, false] 
Alternative option to Relax Min Duration (RELAXMINDUR): changes the minimum duration for all models to 3 states 
                (= 30msec with standard frame rate)to 30msec.
                This can be useful when comparing the duration of different phone groups.
                

STARTWORD: [0.0, 999999.0] 
Option STARTWORD: If set to a value n>0, this option causes maus to start the
                segmentation with the word number n (word numbering in BPF starts with 0). This is useful if the input
                signal file is just a segment within a longer transcript. See also option ENDWORD.

INSYMBOL: [sampa, ipa] 
Option INSYMBOL: Defines the encoding of phonetic symbols in input. If set to 'sampa'
                (default), phonetic symbols are encoded in X-SAMPA (with some coding differences in Norwegian/Icelandic);
                use service runMAUSGetInventar with option LANGUAGE=sampa to get a list of symbols and
                their mapping to IPA). If set to 'ipa', the service expects blank-separated UTF-8 IPA. 
                

PRESEG: [true, false] 
Option PRESEG: If set to true, a pre-segmentation using the wav2trn tool is done by the
                webservice on-the-fly; this is useful, if the input
                signal (or processed chunks within the signal) has leading and/or trailing silence. 
                

AWORD: Option AWORD: the string used to mask word labels for anonymized terms.
                

USETRN: [true, false, force] 
Option USETRN: If the pipe produces/processes a chunk segmentation (CHUNKER/CHUNKPREP), 
                this option is set automatically. If set to true, MAUS searches the input BPF for a TRN tier
                (turn/chunk segmentation, see http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatsdeu.html#TRN). The
                synopsis for a TRN entry is: 'TRN: (start-sample) (duration-sample) (word-link-list) (label)', e.g.
                'TRN: 23654 56432 0,1,2,3,4,5,6 sentence1' (the speech within the recording 'sentence1' starts with
                sample 23654, last for 56432 samples and covers the words 0-6). If only one TRN entry is found, the
                segmentation is restricted within a time range given by this TRN tier entry; this is useful, if there
                exists a reliable pre-segmentation of the recorded utterance, i.e. the start and end of speech within
                the recording is known. If more than one TRN entry is found, the webservice performs an segmentation
                for each 'chunk' defined by a TRN entry and aggregates all individual results into a single results
                file; this is useful if the input consists of long recordings, for which a manual chunk segmentation is
                available. If USETRN is set to 'force' (deprecated since maus 4.11; use PRESEG=true instead!, 
                a pre-segmentation using the wav2trn tool is done by the
                webservice on-the-fly; this is useful, if the input BPF does not contain a TRN entry and the input
                signal has leading and/or trailing silence.

ASRType: [autoSelect, callAmberscriptASR, callEMLASR, callFraunhoferASR, callGoogleASR, callLSTDutchASR, callLSTEnglishASR, callWatsonASR, callUWEBASR, callWhisperXASR] 
Name of the ASR service applied. If set to 'autoSelect', the service will select the next available ASR service that supports the LANGUAGE; if set to 'allServices', the service will send the input signal to all ASR services that support LANGUAGE and output the ASR results in simple txt format. Please note that your input signal is send to a third party ASR service which is not a part of BAS. By selecting a third party service you accept the end user license agreement of this service (as posted on the Web API of BAS services) and agree that your signals are to send to the selected service. Be advised that most of these services store input signals to improve their ASR performance, and that several restrictions (service dependent quotas) apply to the number and amount of input signals (see the 'Show Help' text of the servce 'ASR' on the BAS Web API for details). Some ASR services only allow asynchroneous processing, which means that the response time can be up to several minutes. If you need service capacity exceeding the standard quotas for a specific ASR service, please contact the BAS for special arrangements.
                

MAUSSHIFT: Option MAUSSHIFT: If set to n, this option causes the calculated MAUS segment boundaries
                to be shifted by n msec (default: 0) into the future. Most likely this systematic shift is caused by a
                boundary bias in the training material's segmentation.
                The default should work for most cases.

diarization: [true, false] 
If set to true (default: false), the ASR service will label each word in the result with a speaker label (BPF tier SPK, labels 'S1', 'S2', ... in order of appearance). If the selected ASR service does not support speaker diarization, a WARNING is issued.

HIGHF: [0.0, 30000.0] 
Option HIGHF: upper filter edge in Hz. If set >0Hz and LOWF is 0Hz, a low pass filter 
                with HIGHF Hz is applied; if set >0Hz and LOWF is set lower than HIGHF, a band
                pass between LOWF and HIGHF is applied; if set >0Hz and LOWF is set higher than 0Hz but higher 
                than HIGHF, a reject band pass between HIGHF and LOWF is applied. 
                E.g. HIGHF = 3000 LOWF = 300 is telephone band; HIGHF = 45 LOWF = 55 filters out a 50Hz hum.
                

silenceonly: [0.0, 999999.0] 
If set to a value greater than 0, the chunker will only place chunk boundaries in regions where it has detected a silent interval of at least that duration (in ms). Else, silent intervals are prioritized, but not to the exclusion of word boundaries without silence. On speech that has few silent pauses (spontaneous speech or speech with background noise), setting this parameter to a number greater than 0 is likely to hinder the discovery of chunk boundaries. On careful and noise-free speech (e.g. audio books) on the other hand, setting this parameter to a sensible value (e.g. 200) may reduce chunkin errors.

boost_minanchorlength: [2.0, 8.0] 
If you are using the boost phase, you can set its minimum anchor length independently of the general minimum anchor length. Setting this parameter to a low value (e.g. 2-3) means that the boost phase has a greater chance of finding preliminary chunk boundaries, which is essential for speeding up the chunking process. On the other hand, high values (e.g. 5-6) lead to more conservative and more reliable chunking decisions. If boost is set to false, this option is ignored.

ADDSEGPROB: [true, false] 
Option Add Viterbi likelihoods (ADDSEGPROB) causes that the frame-normalized natural-log total 
                Viterbi likelihood of an aligned segment is appended to the segment label in the 
                output annotation (the MAU tier). This might be used as a 'quasi quality measure' 
                on how good the acoustic signal in the aligned segment has been modelled by the 
                combined acoustical and pronunciation model of MAUS. Note that the values are not 
                probabilities but likelihood densities, and therefore are not comparable for 
                different signal segments; they are, however, comparable for the same signal segment. 
                Warning: this option breaks the BPF standard for the MAU tier and must not be 
                used, if the resulting MAU tier should be further processed, e.g. in a pipe).
                Implemented only for output phoneme symbol set SAMPA (default).
		

Output: A XML response containing the elements "success", "downloadLink", "output" and "warning".
                "success" states if the processing was successful or not, "downloadLink" specifies the location where
                the output file of the pipeline can be found (the format of the file depends on the option selected in OUTFORMAT),
                "output" contains the output that is mostly useful during debugging errors and "warning" lists warnings, if any 
                occured during the processing. Depending on input parameter OUTFORMAT the output file in "downloadlink" can be
                of several different file formats; see mandatory parameter OUTFORMAT for details.
                
----------------------------------------------------------------
----------------------------------------------------------------
runCOALAGetTemplates
------------------
Description: Returns a zip file with the template table files and instructions how to fill them. The tables
            are necessary to create CMDI metadata files with runCOALA.

Example curl call is:
curl -v -X GET -H 'content-type: application/x-www-form-urlencoded' 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runCOALAGetTemplates'


Parameter description: 
Output: A zip file containing the necessary template files and instructions for running
                COALA.
----------------------------------------------------------------
----------------------------------------------------------------
runChunkPreparation
------------------
Description: This pre-processor to MAUS transforms a chunk segmentation (CSV, EAF or TextGrid) into a BAS Partitur 
            Format (BPF) file containing the tiers tokenized words (ORT) and chunk segmentation (TRN). 
            For details about the BAS Partitur Format (BPF) see http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html.
            A 'chunk segmentation' is a rough segmentation of a long speech signal into longer stretches of spoken text ('chunks'),
            e.g. sentences, speaker turns; each chunk consists of timing information (begin/end) and a label that contains the spoken 
            orthographic text (UTF-8 encoded); chunks can be encoded as tiers of an annotation format (e.g. praat TextGrid or ELAN EAF) or in form of a table (CSV).
            The BPF TRN and ORT tiers in the output BPF file contains the (tokenized) word chunks as given in the specified input file tier; the presence of the TRN tier improves the
            performance of a subsequent automatic phonetic segmentation by WebMAUS. 'Tokenization' here means not only the break-up of the transcript at white spaces but also 
            the replacement of digits by number names, the deletion of punctuations and some special characters (see service description runG2P for details).
            if you want to avoid these normalisations, select the language code (option 'Language'/'lng') 'und'.

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F com=no -F lng=deu-DE -F tier=ORT -F rate=-1 -F i=@<filename> -F iform=tg 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runChunkPreparation'

Parameters:  [com] [lng] [tier] [rate] i [iform]

Parameter description: 
com: [yes, no] 
Option com: yes/no decision whether <*> strings in the annotation input should be treated as annotation
                markers. If set to 'yes', then strings of this type are considered as annotation markers that are not
                processed but passed on to the output. The string * within the <*> 
                must not contain any white space characters.
                This means, that the markers appear in the ORT
                and KAN tier of the output BPF file with a word index on their own. WebMAUS makes use of the markers < usb > (e.g.
                non-understandable word or other human noises) and < nib > (non-human noise)
                without the blanks between "usb", "nib" and the brackets "<" and ">" (which are
                needed for formatting reasons). All other markers <*> are modelled as silence, if you
                use this service as a pre-processing for WebMAUS. Markers must not contain white spaces, and must be separated from word tokens
                by blanks. They do not need to be blank-separated from non-word tokens as punctuation.

lng: [aus-AU, afr-ZA, sqi-AL, arb, eus-ES, eus-FR, cat-ES, cze-CZ, nld-NL, eng-AU, eng-GB, eng-NZ, eng-US, ekk-EE, fin-FI, fra-FR, kat-GE, deu-DE, gsw-CH, gsw-CH-BE, gsw-CH-BS, gsw-CH-GR, gsw-CH-SG, gsw-CH-ZH, hat-HT, hun-HU, isl-IS, ita-IT, jpn-JP, gup-AU, ltz-LU, mlt-MT, nor-NO, fas-IR, pol-PL, ron-RO, rus-RU, slk-SK, spa-ES, swe-SE, tha-TH, guf-AU, und] 
RCFC5646 locale language code of the speech to be processed; this is necessary since the 
                tokenization and the replacement of numerals in the input text is language-dependent;
                we use the RFC5646 sub-structure 'iso639-3 - iso3166-1 [ - iso3166-2],
                e.g. 'eng-US' for American English, 'deu-AT-1' for Austrian German spoken in 'Oberoesterreich';
                alternatively and where possible, Iso 639-3 char language code is supported; non-standard codes: 
                'nze' stands for New Zealand English, 'arb' is for variety-independent Arabic romanization ('Tamimi
                Romanization'), 
                'use' for American English. 'und' (undefined) can be used to pass the tokens unchanged, 
                i.e. the tokens found in the chunk label are passed unchanged into the 'ORT' tier of 
                the output BPF.

tier: Name of the item in the TextGrid or EAF input, which is to be transformed into TRN and
                ORT tier of the BPF format. Case-sensitive. Only ELAN annotation tiers that contain timing information are processed. 
                It is possible in an EAF file to have a 'referenced' annotation tier (element REF_ANNOTATION) that only refers to 
                another tier with timing information, but our service cannot process this. A work-around is to go back into ELAN and copy 
                the contents of the reference tier onto the tier with timing information and then use this tier as input.

rate: [0.0, 999999.0] 
Sample rate of signal file, from which the TextGrid or EAF file has been
                derived. Needed for the conversion of absolute times into samples.

i: Input file containing the chunk segmentation. MIMEType depends on input parameter iform.

iform: [tg, eaf, csv] 
Format of the input file. Currently 'tg' (standard or short TextGrid), 'eaf' (ELAN
                annotation format) and 'csv' are supported. Only one tier in the TextGrid/EAF input file is processed (see also option 'tier' for details about ELAN tiers). The csv table file should contain three columns separated by a semicolon containing time onset, offset (in samples), and the transcript (UTF-8 encoded), respectively.

Output: A XML response containing the elements "success", "downloadLink", "output"
                and "warning". "success" states if the processing was successful or not, "downloadLink" specifies the
                location where the output BPF file can be found, "output" contains the console output of the service that
                is mostly useful during debugging errors, and "warnings" contains any warnings that occured during the processing.
                The format of the output file is BAS Partitur Format (BPF) containing the tiers ORT,KAN,TRN..
----------------------------------------------------------------
----------------------------------------------------------------
runMINNI
------------------
Description: This service segments and labels a speech audio file into SAM-PA (or IPA) phonetic segments without any
            text/phonological input; it uses HMM technology combined with a language-specific phonotactic bigram model. 
            This is a general service to process a single file which enables the usage of all possible options of
            MINNI. See the section Input for a detailed description of these options or use the operation
            'runMAUSGetHelp' to download a current version of the MAUS/MINNI documentation.

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F SIGNAL=@<filename> -F LANGUAGE=deu-DE -F OUTFORMAT=TextGrid -F PRESEG=false -F MAUSSHIFT=default -F INSPROB=0.0 -F MINPAUSLEN=5 -F OUTSYMBOL=sampa -F WEIGHT=default -F ADDSEGPROB=false 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runMINNI'

Parameters:  SIGNAL [LANGUAGE] [OUTFORMAT] [PRESEG] [MAUSSHIFT] [INSPROB] [MINPAUSLEN] [OUTSYMBOL] [WEIGHT] [ADDSEGPROB]

Parameter description: 
SIGNAL: mono sound file containing the speech signal to be segmented; PCM 16 bit resolution; any
                sampling rate; optimal results if leading and trailing silence intervals are truncated before
                processing. Although the mimetype of this input file is restricted to
                audio/x-wav (wav|WAV), the service will also process NIST/SPHERE (nis|NIS) and ALAW
                (al|AL|dea|DEA).

LANGUAGE: [afr-ZA, aus-AU, cat-ES, nld-BE, nld-NL, eng-AU, eng-GB, eng-US, ekk-EE, fra-FR, deu-DE, gsw-CH, hun-HU, ita-IT, nan-TW, nor-NO, fas-IR, pol-PL, rus-RU, spa-ES, tha-TH] 
Language of the speech to be processed; defines the possible phoneme symbol set in MAUS
                input; we use the RFC5646 sub-structure 'iso639-3 - iso3166-1 [- iso3166-2], e.g. 'eng-US' for American
                English, 'deu-AT-1' for Austrian German spoken in 'Oberoesterreich'. The non-standard language code
                'sampa' denotes a language independent SAM-PA variant of MAUS for which the SAM-PA symbols in the input
                BPF must be blank separated (e.g. /h OY t @/).

OUTFORMAT: [bpf, exb, csv, TextGrid, emuDB, eaf, tei, par] 
Option 'Output format' (OUTFORMAT): Defines the possible output formats: TextGrid - a praat compatible
                TextGrid file; bpf - a BPF file with tier MAU (phonetic segmentation); csv - a spreadsheet
                (CSV table) that contains the phonetic segmentation;
                emuDB - an Emu compatible *_annot.json file;
                eaf - an ELAN compatible annotation file; exb - an EXMARaLDA compatible annotation file;
                tei - Iso TEI document (XML).
                For a description of BPF see
                http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html.
                for a description of Emu see https://github.com/IPS-LMU/emuR.
                Note 1: using 'emuDB' will first produce only single annotation file *_annot.json;
                in the WebMAUS interface (https://clarin.phonetik.uni-muenchen.de/BASWebServices) you can process
                more than one file and than download a zipped Emu database; in this case don't forget to change the default
                name of the emuDB 'MAUSOUTPUT' using the R function emuR::rename_emuDB().
                Note 2: if you need the same result in more than one format, select 'bpf' to produce a BPF file, and then
                convert this file with the service runAnnotConv ('AnnotConv') into the desired formats.
                Note 3: some format conversions are not loss-less; select 'bpf' to be sure that no information is lost.
                

PRESEG: [true, false] 
Option PRESEG: If set to true, a pre-segmentation using the wav2trn tool is done by the
                webservice on-the-fly; this is useful, if the input
                signal has leading and/or trailing silence.

MAUSSHIFT: If set to n, this option causes the calculated MAUS segment boundaries to be shifted by n
                msec (default: 0) into the future. Most likely this systematic shift is caused by a
                boundary bias in the training material's segmentation.
                The default should work for most cases.

INSPROB: The option INSPROB influences the probability of detecting two segments instead of one. It is a constant
                factor (a constant value added to the log likelihood score) after each segment. Therefore, a higher (positive) 
                value of INSPROB will cause the probability of segmentations with more segments to go up and vice versa negative values will
                cause the number of detected segments to go down.
                This parameter has only been evaluated using MAUS (not MINNI) on parts of the German Verbmobil data set (27425
                segments) which were segmented and labelled manually (MAUS DEV set) and found to have its optimum at
                0.0 (which is nice). Therefore we set the default value of INSPROB to 0.0. INSPROB was also tested
                against the MAUS TEST set to confirm the value of 0.0. It had an optimum at 0.0 as well. Note that this
                might NOT be the optimal value for MINNI processing.

MINPAUSLEN: [0.0, 999.0] 
Option MINPAUSLEN: Controls the behaviour of optional inter-phone silence. If set to 1,
                maus will detect all inter-phone silence intervals that can be found (minimum length for a silence
                interval is then 10 msec = 1 frame). If set to values n>1, the minimum length for an inter-phone
                silence interval to be detected is set to n*10 msec. For example MINPAUSLEN of 5 will cause MAUS to
                suppress inter-phone silence intervals up to a length of 40msec. Since 40 msec seems to be the border
                of perceivable silence, we set this option default to 5. With other words: inter-phone silences smaller
                than 50msec are not segmented but rather distributed equally to the adjacent segments. If one of the
                adjacent segments happens to be a plosive then the deleted silence interval is added totally to the
                plosive; if both adjacent segments are plosives, the interval is equally spread as with non-plosive
                adjacent segments.

OUTSYMBOL: [sampa, ipa, manner, place] 
Option Output Encoding (OUTSYMBOL): Defines the encoding of phonetic symbols in output. If set to 'sampa'
                (default), phonetic symbols in output are encoded in X-SAMPA (with some minor differences in languages
                Norwegian/Icelandic in which the retroflex consonants are encoded as 'rX' instead of X-SAMPA 'X_r');
                use service runMAUSGetInventar with option LANGUAGE=sampa to get a list of symbols and
                their mapping to IPA. If set to 'ipa', the service produces UTF-8 IPA output. If set to 'manner', the
                service produces IPA manner of articulation for each segment; possible values are: silence, vowel,
                diphthong, plosive, nasal, fricative, affricate, approximant, lateral-approximant, ejective. If set to
                'place', the service produces IPA place of articulation for each segment; possible values are: silence,
                labial, dental, alveolar, post-alveolar, palatal, velar, uvular, glottal, front, central,
                back.

WEIGHT: The option WEIGHT weights the influence of the statistical phonotactic bigram model (the a-priori probability of pronuciation) 
                against the acoustical scores. More precisely WEIGHT is multiplied to the phonotactic model score (log
                likelihood) before adding the score to the acoustical score within the Viterbi search. Since MINNI uses a phonotactic
                bigram model, increasing WEIGHT will at some point cause MINNI to choose always the same most likely sequence of phones 
                according to the bigram model (disregarding the acoustics) with equally long segments, i.e. no meaningful segmentation at all; 
                lower values of WEIGHT will cause phoneme sequences to be detected according to acoustic evidence, even if the resulting pronunciation is less likely 
                according to the phonotactic bigram model; 
                if WEIGHT is set to 0.0 the bigram is completely ignored and MINNI performs a 
                phone recognition bases only on acoustic likelihood (and any sequence of phones is a-priori equally probable). If the
                acoustic quality of the signal is very good and the HMMs of the language are well trained, it makes
                sense to lower WEIGHT to achieve more precise results given the acoustic. For most languages this option is default to 1.0
                (which means that acoustic evidence and a-priori pronunciation probability are treated equally).

ADDSEGPROB: [true, false] 
Option Add Viterbi likelihoods (ADDSEGPROB) causes that the frame-normalized natural-log total 
                Viterbi likelihood of an aligned segment is appended to the segment label in the 
                output annotation (the MAU tier). This might be used as a 'quasi quality measure' 
                on how good the acoustic signal in the aligned segment has been modelled by the 
                combined acoustical and pronunciation model of MAUS. Note that the values are not 
                probabilities but likelihood densities, and therefore are not comparable for 
                different signal segments; they are, however, comparable for the same signal segment. 
                Warning: this option breaks the BPF standard for the MAU tier and must not be 
                used, if the resulting MAU tier should be further processed, e.g. in a pipe).
                Implemented only for output phoneme symbol set SAMPA (default).
		

Output: A XML response containing the tags "success", "downloadLink", "output" and "warning.
                success states if the processing was successful or not, downloadLink specifies the location where
                segmentation file can be found (the format of the file depends on the option selected in OUTFORMAT),
                output contains the output that is mostly useful during debugging errors and warnings if any warnings
                occured during the processing. 
                
----------------------------------------------------------------
----------------------------------------------------------------
runTextAlign
------------------
Description: Optimal alignment of text sequence pairs, for example the optimal alignment of an orthographic string to its corresponding phonological transcript.

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F cost=intrinsic -F atype=dir -F i=@<filename> -F displc=no -F costfile=@<filename> 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runTextAlign'

Parameters:  [cost] [atype] i [displc] [costfile]

Parameter description: 
cost: [naive, intrinsic, import, g2p_aus, g2p_deu, g2p_ekk, g2p_eng, g2p_fin, g2p_fra, g2p_gsw, g2p_hat, g2p_hun, g2p_ita, g2p_kat, g2p_nld, g2p_nze, g2p_pol, g2p_ron, g2p_rus, g2p_slk, g2p_sqi, g2p_use] 
Cost function for the edit operations substitution, deletion, and insertion to be used for
                the alignment. 'naive' assigns cost 1 to all operations except of null-substitution, i.e. the
                substitution of a symbol by itself, which receives cost 0. This 'naive' cost function should be used
                only if the pairs to be aligned share the same vocabulary, which is NOT the case e.g. in
                grapheme-phoneme alignment (grapheme 'x' is not the same as phoneme 'x'). 'g2p_deu', 'g2p_eng' etc. are
                predefined cost functions for grapheme-phoneme alignment for the respective language expressed as
                iso639-3. By selecting 'intrinsic' a cost function is trained on the input data and returned in the
                output zip. Costs are derived from co-occurrence probabilities, thus the bigger the input file, the
                more reliable the emerging cost function. By 'import' the user can provide his/her own cost function
                file, that must be a semicolon-separated 3-column csv text file. Examples: v;w;0.7 - the substitution
                of 'v' by 'w' costs 0.7. v;_;0.8 - the delition of 'v' costs 0.8; _;w;0.9 - the insertion of 'w' costs
                0.9. A typical usecase is to train a cost function on a big data set with cost='intrinsic', and to
                subsequently apply this cost function on smaller data sets with cost='import'.

atype: [dir, sym] 
Alignment type: 'dir' - align the second column to the first. 'sym' symmetric
                alignment.

i: csv text file with two semicolon-separated columns. Each row contains a sequence pair to
                be aligned. The sequence elements must be separated by a blank. Example: a word and its canonical
                transcription like S c h e r z;S E6 t s

displc: [yes, no] 
Yes/no decision whether alignment costs should be displayed in a third column in the
                output file.

costfile: csv text file with three semicolon-separated columns. Each row contains three columns of
                the form a;b;c, where c denotes the cost for substituting a by b. Insertion and deletion are are marked
                by an underscore. Examples (from German grapheme-phoneme conversion): e;E;0.96 - replacing grapheme e
                by phoneme E costs 0.96. e;_;0.89: e-deletion costs 0.89. _;E;0.99: E-insertion costs
                0.99.

Output: Output zip file, that contains a semicolon-separated 2- or 3-column csv text file with the
                aligned output. The third column comprises the alignment costs, if the parameter 'displc' is set to
                'yes. If 'cost' is set to 'intrinsic' the zip file additionally contains a cost function file in the
                format as described for the parameter 'cost'.
----------------------------------------------------------------
----------------------------------------------------------------
getLoadIndicator
------------------
Description: Returns an indicator how high the server load is - 0 (for low load, i.e., less than 50 percent load), 
            1 (for middle load, i.e., between 50 and 100 percent load), and 2 (for high load, i.e., more than 100 percent load).
            

Example curl call is:
curl -v -X GET -H 'content-type: application/x-www-form-urlencoded' 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/getLoadIndicator'


Parameter description: 
Output: Number that indicates the load.
----------------------------------------------------------------
----------------------------------------------------------------
runChunker
------------------
Description: The chunker is a preprocessing tool for the MAUS segmentation service that splits very long recordings into smaller 'chunks'. Since MAUS's runtime grows quadratically with input duration, it cannot be used on recordings that are longer than approx. 3000 words. In this case, the chunker can presegment the recording into shorter "chunks". This chunk segmentation, which is NOT a semantically meaningful sentence or turn segmentation, can then be used to speed up the MAUS segmentation process. Like MAUS, the chunker accepts a media file containing the speech signal and a BAS Partitur Format (BPF) file containing a canonical transcription of the recording (KAN tier). This canonical transcription can be derived from an orthographic text using the G2P tool (runG2P). The chunker outputs a new BAS Partitur File with a TRN tier that can be used as chunk segmentation input to MAUS.

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F maus=false -F language=deu-DE -F aligner=hirschberg -F USEREMAIL= -F bpf=@<filename> -F boost=true -F force=false -F audio=@<filename> -F silenceonly=0 -F minanchorlength=3 -F boost_minanchorlength=4 -F insymbols=sampa -F minchunkduration=15 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runChunker'

Parameters:  [maus] [language] [aligner] [USEREMAIL] bpf [boost] [force] audio [silenceonly] [minanchorlength] [boost_minanchorlength] [insymbols] [minchunkduration]

Parameter description: 
maus: [true, false] 
If this parameter is set to true, the recognition module will model words as MAUS graphs as opposed to canonical chains of phonemes. This will slow down the recognition engine, but it may help with non-canonical speech (e.g., accents or dialects).

language: [aus-AU, afr-ZA, sqi-AL, arb, eus-ES, eus-FR, cat-ES, nld-BE, nld-NL, eng-US, eng-AU, eng-GB, eng-NZ, eng-SC, ekk-EE, fin-FI, fra-FR, kat-GE, deu-DE, gsw-CH, gsw-CH-BE, gsw-CH-BS, gsw-CH-GR, gsw-CH-SG, gsw-CH-ZH, hun-HU, isl-IS, ita-IT, jpn-JP, sampa, ltz-LU, mlt-MT, nan-TW, nor-NO, fas-IR, pol-PL, por-PT, ron-RO, rus-RU, spa-ES, tha-TH] 
Language of the speech to be processed. This parameter defines the set of possible input phonemes and their acoustic models. RFC5646 sub-structure 'iso639-3 - iso3166-1 [- iso3166-2], e.g. 'eng-US' for American English. The language code 'sampa' (not RCFC5646) denotes a language independent SAM-PA variant of MAUS for which the SAM-PA symbols in the input BPF must be blank separated (e.g. /h OY t @/).

aligner: [hirschberg, fast] 
Symbolic aligner to be used. The "fast" aligner performs approximate alignment by splitting the alignment matrix into "windows" of size 5000*5000. The "hirschberg" aligner performs optimal matching. On recordings below the 1 hour mark, the choice of aligner does not make a big difference in runtime. On longer recordings, you can improve runtime by selecting the "fast" aligner. Note however that this choice increases the probability of errors on recordings with untranscribed stretches (such as long pauses, musical interludes, untranscribed speech). Therefore, the "hirschberg" aligner should be used on this kind of material.

USEREMAIL: Option USEREMAIL: if a valid email address is provided through this option, the 
                service will send the XML file containing the results of the service run to this address after completion.
                It is recommended to set this option for long recordings (batch size <6, length >1h) since it is often problematic to wait 
                for service completion over an instable internet connection or from a laptop that might go into hibernation.
                The email address provided is not stored on the server. It is sometimes even advisable to kill the browser tab 
		after starting the call and wait for the result emails (only for batch size <6!).
                Beware: the download link to your result(s) will be valid for 24h after you receive the email; after that
                all your data will be purged from the server.
                Disclaimer: the usage of this option is at your own risk; the key URL to download your result file will
                be send without encryption in this email; be aware that anybody who can intercept this email will be able 
                to access your result files using this key; the BAS at LMU Munich will not be held responsible for any
                security breach caused by using this email notification option.
                

bpf: Phonemic transcription of the utterance to be segmented. Format is a BAS Partitur Format (BPF) file with a KAN tier. The KAN tier contains a table with 3 columns and one line per word in the input. Column 1 is always 'KAN:'; column 2 is an integer starting with 0 denoting the word position within the input; column 3 contains the canonical pronunciation of the word coded in SAM-PA. The canonical pronunciation string may contain phoneme-separating blanks. For supported languages, the BPF can be derived using the G2P service (runG2P). See http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html for detailed description of the BPF.

boost: [true, false] 
If set to true (the default), the chunker will start by running a so-called boost phase over the recording. This boost phase uses a phoneme-based decoder instead of speech recognition. Usually, the boost option reduces processing time. On noisy input or faulty transcriptions, the boost option can lead to an increase in errors. In this case (or if a previous run with boost set to 'true' has led to chunking errors), set this option to 'false'.

force: [true, false, rescue] 
If this parameter is set to true, the chunker will run in the experimental 'forced chunking' mode. While forced chunking is much more likely to return a fine-grained chunk segmentation, it is also more prone to chunking errors. As a compromise, you can also set this parameter to 'rescue'. In this case, the forced chunking algorithm is only invoked when the original algorithm has returned chunks that are too long for MAUS.

audio: Mono WAVE or NIST/SPHERE sound file or video file (MP4,MPEG) containing the speech signal to be segmented. PCM 16 bit resolution, any sampling rate.

silenceonly: [0.0, 999999.0] 
If set to a value greater than 0, the chunker will only place chunk boundaries in regions where it has detected a silent interval of at least that duration (in ms). Else, silent intervals are prioritized, but not to the exclusion of word boundaries without silence. On speech that has few silent pauses (spontaneous speech or speech with background noise), setting this parameter to a number greater than 0 is likely to hinder the discovery of chunk boundaries. On careful and noise-free speech (e.g. audio books) on the other hand, setting this parameter to a sensible value (e.g. 200) may reduce chunkin errors.

minanchorlength: [2.0, 8.0] 
The chunker performs speech recognition and symbolic alignment to find regions of correctly aligned words (so-called 'anchors'). Setting this parameter to a high value (e.g. 4-5) means that the chunker finds chunk boundaries with higher certainty. However, the total number of discovered chunk boundaries may be reduced as a consequence. A low value (e.g. 2) is likely to lead to a more fine-grained chunking result, but with lower confidence for individual chunk boundaries.

boost_minanchorlength: [2.0, 8.0] 
If you are using the boost phase, you can set its minimum anchor length independently of the general minimum anchor length. Setting this parameter to a low value (e.g. 2-3) means that the boost phase has a greater chance of finding preliminary chunk boundaries, which is essential for speeding up the chunking process. On the other hand, high values (e.g. 5-6) lead to more conservative and more reliable chunking decisions. If boost is set to false, this option is ignored.

insymbols: [sampa, ipa] 
Defines the encoding of phonetic symbols in the input KAN tier. If set to 'sampa' (default), phonetic symbols are encoded in language specific SAM-PA (with some coding differences to official SAM-PA; use service runMAUSGetInventar with option LANGUAGE=sampa to get a list of symbols and their mapping to IPA). If set to 'ipa', the service expects blank-separated UTF-8 IPA.

minchunkduration: [0.0, 999999.0] 
Lower bound for output chunk duration in seconds. Note that the chunker does not guarantee an upper bound on chunk duration.

Output: An XML response containing the tags "success", "downloadLink", "output" and "warning". "success" states whether the processing was successful or not, "downloadLink" specifies the location where the output BPF file is provided. The BPF contains the content of the input BPF (option "bpf") with an appended TRN tier. The TRN tier contains the discovered chunking of the signal. The output BPF can be used as an input BPF to runMAUS together with the option USETRN=true.
----------------------------------------------------------------
----------------------------------------------------------------
runG2P
------------------
Description: This web service converts an orthographic text input into a canonical phonological transcript (standard pronunciation). G2P (short for 'grapheme to phoneme conversion') reads a continuous text or word list, and estimates the most likely string of phonemes that a standard speaker of that language is expected to articulate. G2P uses statistically trained decision trees and some more tricks like Part-of-speech tagging and morphological segmentation to improve the decision process. Each language version of G2P is trained on a large set of pronunciations from this language (a pronunciation dictionary) or is based on a letter-sound mapping table in case of simple unique correspondences. The way G2P operates depends on numerous options and the chosen input and output format. For instance, some input formats contain non-tokenized text (e.g. txt) that will be subject to tokenisation and normalisation, while others contain already tokenized text (list,bpf) that will be processed as is. Most output formats also come in an 'extended' version (indicated by a 'ext' in the format name, e.g. 'exttab') that lists more information than the the phonemic transcript; extended output is only avaliable for a small subset of language yet. For more detailed information about the methods G2P applies please refer to: Reichel, U.D. (2012). PermA and Balloon: Tools for string alignment and text processing, Proc. of the Interspeech. Portland, Oregon, paper no. 346.

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F com=no -F tgrate=16000 -F stress=no -F imap=@<filename> -F lng=deu-DE -F lowercase=yes -F syl=no -F outsym=sampa -F nrm=no -F i=@<filename> -F tgitem=ort -F align=no -F featset=standard -F iform=txt -F except=@<filename> -F embed=no -F oform=bpf 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runG2P'

Parameters:  [com] [tgrate] [stress] [imap] [lng] [lowercase] [syl] [outsym] [nrm] i [tgitem] [align] [featset] [iform] [except] [embed] [oform]

Parameter description: 
com: [yes, no] 
yes/no decision whether <*> strings should be treated as annotation
                markers. If set to 'yes', then strings of this type are considered as annotation markers that are not
                processed but passed on to the output. The string * within the <*> 
                must not contain any white space characters.
                For oform='bpf' this means, that the markers appear in the ORT
                and KAN tier with a word index on their own. WebMAUS makes use of the markers < usb > (e.g.
                non-understandable word or other human noises) and < nib > (non-human noise)
                without the blanks between "usb", "nib" and the brackets "<" and ">" (which are
                needed for formatting reasons). All other markers <*> are modelled as silence, if you
                use runG2P for WebMAUS. Markers must not contain white spaces, and must be separated from word tokens
                by blanks. They do not need to be blank-separated from non-word tokens as punctuation.

tgrate: [0.0, 999999.0] 
Signal sampling rate: only needed, if 'iform' ('Input format') is 'tg' and 'oform' ('Output format') is 'bpf(s)'. 
                Sample rate of the corresponding speech signal; needed to
                convert time values from TextGrid to sample values in BAS Partitur Format (BPF) file.
                If you don't know the sample rate, look in the Properties/Get Info list of the sound file.

stress: [yes, no] 
yes/no decision whether or not word stress is to be added to the canonical
                transcription (KAN tier). Stress is marked by a single apostroph (') that is inserted before the syllable nucleus 
                into the transcription.

imap: Customized mapping table from orthography to phonology. If the option 'lng' ('Language') is set to 'und' ('User defined'), 
                a G2P mapping table must be provided via this option. This mapping table is used then to translate the input text into phonological symbols. 
                See https://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/readme_g2p_mappingTable.txt for details about the format of the mapping table.

lng: [cat, deu, eng, fin, hat, hun, ita, mlt, nld, nze, pol, aus-AU, afr-ZA, sqi-AL, arb, eus-ES, eus-FR, cat-ES, cze-CZ, nld-NL, eng-US, eng-AU, eng-GB, eng-NZ, ekk-EE, fin-FI, fra-FR, kat-GE, deu-DE, gsw-CH-BE, gsw-CH-BS, gsw-CH-GR, gsw-CH-SG, gsw-CH-ZH, gsw-CH, hat-HT, hun-HU, isl-IS, ita-IT, jpn-JP, gup-AU, ltz-LU, mlt-MT, nan-TW, nor-NO, fas-IR, pol-PL, ron-RO, rus-RU, slk-SK, spa-ES, swe-SE, tha-TH, guf-AU, und] 

                Language: RCFC5646 locale code of the processed text; defines the phoneme set of input and output; we use the RFC5646 sub-structure 
                'iso639-3 - iso3166-1 [ - iso3166-2], e.g. 'eng-US' for American English, 'deu-AT-1' for Austrian German spoken in
                'Oberoesterreich'; the code 'und' ('User defined') allows the user to upload a customized 
                mapping from orthographic to phonologic form (see option '-imap'); for backwards compatibility some older non-standard codes are
                still supported: 'nze' stands for New Zealand English, 'use' for American English. Special languages: 'gsw-CH' denotes
                text written in Swiss German 'Dieth' transcription (https://en.wikipedia.org/wiki/Swiss_German); 'gsw-CH-*' are localized varieties in larger Swiss cities;
                'jpn-JA' (Japanese) accepts Kanji or Katakana or a mixture of both, but the tokenized output will contain only the Katakana version of the input;
                'aus-AU' (Australian Aboriginal languages, including Kunwinjku, Yolnu Matha) accept so called 'Modern Practical Orthography' 
                (https://en.wikipedia.org/wiki/Transcription_of_Australian_Aboriginal_languages);
                'fas-IR' (Persian) accepts a romanized version of Farsi developped by Elisa Pellegrino and Hama Asadi (see 
                http://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/PersianRomanizationTable.pdf for details);
                'arb' is a macro language covering all Arabic varieties; the input must be encoded in a broad phonetic romanization developped 
                by Jalal Tamimi and colleagues (see http://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/TamimiRomanization.pdf for details).
                

lowercase: [yes, no] 
yes/no decision whether orthographic input is treated case sensitive (no) or not (yes). Applies only, if the option 'lng' is set to 'und' and a customized mapping table is loaded via option 'imap'.

syl: [yes, no] 
yes/no decision whether or not the output transcription is to be syllabified.
                Syllable boundaries '.' are inserted into the transcription with separating blanks.

outsym: [sampa, x-sampa, maus-sampa, ipa, arpabet] 
Ouput phoneme symbol inventory. The language-specific SAMPA variant is the default. Alternatives are: language independent X-SAMPA, MAUS-SAMPA, IPA and ARPABET. MAUS-SAMPA maps the output to a language-specific phoneme subset that WEBMAUS can process. ARPABET is supported for eng-US only.

nrm: [yes, no] 
Text normalization. Currently available for German and English variants only.
                Detects and expands 22 non-standard word types. All output file types supported but
                not available for the following tokenized input types: bpf, TextGrid, and tcf. If
                switched off, only number expansion is carried out.

i: Orthographic text or annotation of the utterance to be converted; encoding must be UTF-8; formats are defined in option 'iform'. Continuous text input undergoes several text normalization stages resulting in a tokenized word chain that repesents the most likely spoken utterance (e.g. numbers are converted into their full word forms). See the webservice help page of the Web interface for details: https://clarin.phonetik.uni-muenchen.de/BASWebServices/interface/Grapheme2Phoneme. 
                  Special languages for text input:
                        Thai, Russian and Georgian expect their respective standard alphabets;
                        Japanese allows Kanji or Katakana or a mixture of both, but the tokenized output will contain only the
                        Katakana version of the input; 
                        Swiss German expects input to be transcribed in 'Dieth' (https://en.wikipedia.org/wiki/Swiss_German);
                        Australian Aboriginal languages (including Kunwinjku, Yolnu Matha) expect so called 'Practical Orthography'
                        (https://en.wikipedia.org/wiki/Transcription_of_Australian_Aboriginal_languages);                        
                        Persian accepts a romanized version of Farsi developped by Elisa Pellegrino and Hama Asadi (see
                        http://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/PersianRomanizationTable.pdf)
                        for details).
                

tgitem: TextGrid tier name: only needed, if 'iform' ('Input format') is 'tg'. Name of the TextGrid tier (item), that contains
                the words to be transcribed. In case of TextGrid output, this tier is the reference tier for the added
                tiers.

align: [yes, no, maus] 
yes/no/sym decision whether or not the transcription is to be
                letter-aligned. Examples: if align is set to 'yes' the transcription for 'archaeopteryx' is 'A: _ k _ _
                I Q p t @ r I k+s', i.e. 'ar' is mapped to 'A: _', and 'x' to 'k+s'. If contained in the output,
                syllable boundaries and word stress are '+'-concatenated with the preceeding, resp. following symbol.
                'sym' causes a special symmetric alignment which is needed e.g. for MAUS rule training, i.e. word: a r
                c h a e o p t e r y x _; transcription: A: _ k _ _ I Q p t @ r I k s. Syllable boundaries and word
                stress are not part of the output of this 'sym' alignment. For the output formats 'tab', 'exttab',
                'lex', and 'extlex' also the aligned orthography is letter-splitted to account for multi-character
                letters in languages as Hungarian.

featset: [standard, extended] 
Feature set used for grapheme-phoneme conversion. The standard set is the
                default and comprises a letter window centered on the grapheme to be converted. The extended set
                additionally includes part of speech and morphological analyses. The extended set is currently
                available for German and British English only. For connected text the extended feature set generally
                generally yields better performance. However, if the input comprises a high amount of proper names
                provoking erroneous part of speech tagging and morphologic analyses, than the standard feature set is
                more robust.

iform: [txt, bpf, list, tcf, tg] 
Accepted input formats for grapheme phoneme conversion: 'txt' indicates
                normal text input, which will be tokenized before the conversion. 'list' indicates a sequence of
                unconnected words, that does not need to be tokenized. Furthermore, 'list' requires a different
                part-of-speech tagging strategy than 'txt' for the extraction of the 'extended' feature set (see
                Parameter 'featset'). 'tcf' indicates, that the input format is TCF containing at least a tokenization
                dominated by the element 'tokens'. 'tg' indicates TextGrid input. Long and short format is supported.
                For TextGrid input additionally the name of the item containing the words to be transcribed is to be
                specified by the parameter 'tgname'. In combination with 'bpf' output format 'tg' input additionally
                requires the specification of the sample rate by the parameter 'tgrate'. Input format 'bpf' indicates
                BAS partitur file input containing an ORT tier to be transcribed.
                -------------------------
                Connected input text ('txt') will be (word-)tokenized and (partially) normalized before it is converted into phonemic symbols. In the following we list the most important conversions done on the text input:
                - all non-alphanumeric characters (including '$' and '€') are deleted, except '-', '.' and ',' in connection with digits. 
                - all forms of single apostrophes are deleted, except for the languages ita, fra and ltz, in which d' D' l' L' preceeding a word (e.g. l'aqua) are split from the word and treated as extra tokens (e.g. l'aqua will be l' + aqua); note that there are many more cases of apostrophe usage where this split is not done. 
                - other punctuations and brackets: are deleted. 
                - if option 'Keep annotation = yes': expressions within '<>' brackets are protected and passed as is to the output, e.g. '<Lachen>' will appear as '<Lachen>' in the phonemic transcription. White space characters (blanks, tabs etc.) are not allowed within the '<>' brackets; if they are necessary, replace them with '_'. 
                - numerals: are converted into number words, e.g. '5' --> 'five', '12' --> twelve, '23' --> 'twenty-three'. 
                - single small and capital characters: are spelled out, e.g. 'b C g' --> /bi: zi: dZi:/. 
                - strings of capital characters: are spelled out, e.g. 'USA' --> /ju:eseI/. 
                If option 'Text normalization = yes' the following extra rules apply (only for languages deu-DE and eng-GB): 
                - Many special characters such as '$' '€' '£' etc. are spelled out as 'Dollar' 'Euro' 'Pfund/Pound'. Often this depends on the context, e.g. a '.' can be translated as 'dot' within an URL but ignored otherwise. 
                - special characters that can be expanded: % & $ § @ = € £ ₤ ¼ ½ ¾ © ° + < > ≤ ≥ - characters ² ³ , . / \ : _ ~ are sometimes expanded in special contexts of equations, units, URLs etc. 
                - special numeric expressions such as date, time, amounts, ordinal numbers are translated correctly, e.g. '5. January 1923' --> 'fifth January nineteen-twentythree', '23€' --> 'twentythree Euro', '$30' --> 'thirty dollars', 'Clemens X' --> 'Clemens tenth', '10:15' --> 'a-quarter-past-ten'. 
                - strings of capital characters that can be pronounced as words ('acronyms') sometimes are not spelled but spoken as a word: 'ESSO' --> /?E:sO/. 
                Since plain text files can have different encodings, BOMs, line terminators etc., it is highly recommended to run input text files through the service 'TextEnhance' before feeding them into G2P (the 'Pipeline' services do that automatically); this service also allows the correct bracketing of linguistic markers and comment lines so that they can be passed through the pipeline and are not interpreted as being spoken. 
                Special languages: Thai, Russian and Georgian expect their respective standard alphabets; Japanese allows Kanji or Katakana or a mixture of both, but the tokenized output will contain only the Katakana version of the input; Swiss German expects input to be transcribed in 'Dieth' (https://en.wikipedia.org/wiki/Swiss_German); Australian Aboriginal languages (including Kunwinjku, Yolnu Matha) expect so called 'Modern Practical Orthography' (https://en.wikipedia.org/wiki/Transcription_of_Australian_Aboriginal_languages); Persian accepts a romanized transcript developped by Elisa Pellegrino and Hama Asadi (see ... for details). 
                

except: name of an exception dictionary file overwriting the g2p output. Format: 2 semicolon-separated columns 'word; transcription (in X-SAMPA). Phonemes blank-separated. Example: sagt;z ' a x t.

embed: [no, maus] 
Macro option for embedding G2P into WEBMAUS. If set to 'maus', it overwrites
                several basic options as follows: 'stress', 'syl', and 'align' are set to 'no'. 'oform' is set to 'bpfs'. 'outsym' is set to 'maus-sampa'.
                Small single letters are transcribed as word fragments instead of spelling.

oform: [txt, tab, exttab, lex, extlex, bpf, bpfs, extbpf, extbpfs, tcf, exttcf, tg, exttg] 
Output format: 'bpf' indicates the BAS Partitur Format (BPF) file with an ORT/KAN tier. The
                tiers contains a table with 3 columns and one line per word in the input. Column 1 is always 'ORT:/KAN:';
                column 2 is an integer starting with 0 denoting the word position within the input; column 3 contains
                for ORT the (possibly normalized) orthographic word, for KAN the canonical pronunciation of the word coded in 
                SAM-PA (or IPA); the latter does not contain blanks. 'bpfs'
                differs from 'bpf' only in that respect, that the phonemes in KAN are separated by blanks. In case of TextGrid
                input, both 'bpf' and 'bpfs' require the additional parameters 'tgrate' and 'tgitem'. Additionally, the content of
                the TextGrid tier 'tgitem' is stored as a word chunk segmentation in the BPF tier TRN. 'extbpf' or
                'extbpfs' extend the BPF output file by the tiers POS (part of speech, STTS tagset), KSS (full phonemic 
                transcript including e.g. lexical accent), TRL (orthographic transcript with punctuation), and MRP (morph
                segmentation and classes). 'txt'
                cause a replacement of the input words by their phonemic transcriptions; single line output without
                punctuation, where phonemes are separated by blanks and words by tabulators. 'tab' returns the grapheme
                phoneme conversion result in form of a table with two columns. The first column comprises the words,
                the second column their blank-separated transcriptions. 'exttab' results in a 5-column table. The
                columns contain from left to right: word, transcription, part of speech, morpheme segmentation, and
                morpheme class segmentation. 'lex' transforms the table to a lexicon, i.e. words are unique and
                sorted. 'extlex' provides the same information as 'exttab' in a unique and sorted manner. For all lex
                and tab outputs columns are separated by ';'. If option 'align' is switched on, the first (word) column is
                letter-segmented. 'tcf' creates either a tcf output file from scratch (in case iform
                is not 'tcf'), or a transcription tier is added to the input tcf file. If a tcf file is generated
                from scratch, it contains the elements 'text', 'tokens', and 'BAS_TRS' for the phonemic transcription. 
                oform 'exttcf' additionally adds the elements 'BAS_POS' (part of speech, STTS tagset), 'BAS_MORPH' (morph 
                segmentation), and 'BAS_MORPHCLASS' (morph classes). 'tg' and 'exttg' produce TextGrid output; 
                for this a TextGrid input (iform 'tg') is required. With 'tg' the tier 'BAS_TRS' (phonemic 
                transcript) is inserted to the TextGrid which runs parallel to the tier
                specified by the parameter 'tgitem'; words are separated by an '#' symbol. 'exttg' adds
                the tiers 'BAS_POS', 'BAS_MORPH', and 'BAS_MORPHCLASS' parallel to 'BAS_TRS'. Their content is
                the same as for oform 'exttcf'. The 'extended' oform versions 'exttab', 'extlex', 'exttcf', and 'exttg' 
                are only available for 
                languages deu|eng-*|aus|nze|use; for the other languages these formats are replaced by the corresponding
                non-extended format. While the output contains punctuation for 'exttab', 'tcf', 'exttcf', and 'exttg'
                for the other formats it is ignored.

Output: A XML response containing the elements "success", "downloadLink", "output"
                and "warning". "success" states if the processing was successful or not, "downloadLink" specifies the
                location where the file containing the phonemic transcription in SAM-PA (segmented in words) can be
                found (the format of the file depends on the option selected in oform), "output" contains the output that
                is mostly useful during debugging errors, and "warnings" contains any warnings that occured during the processing.
                The format of the output file depends on the value of input parameter oform.
----------------------------------------------------------------
----------------------------------------------------------------
runCOALA
------------------
Description: Generates corpus and session CMDIs according to the media-corpus-profile and the
            media-session-profile of the ComponentRegistry by converting five CSV tables to the CMDI format. Use the
            runCOALAGetTemplates WebService to get templates for these tables. The resulting session CMDIs can be used
            as they are, while the corpus CMDI needs to be edited by hand.

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F writtenresources-table=@<filename> -F mediafiles-table=@<filename> -F corpus-title=<value> -F bundles-table=@<filename> -F corpus-name=<value> -F sessions-table=@<filename> -F actors-table=@<filename> 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runCOALA'

Parameters:  writtenresources-table mediafiles-table [corpus-title] bundles-table [corpus-name] sessions-table actors-table

Parameter description: 
writtenresources-table: Assigns the file as the WrittenResources (i.e. Annotations) table.

mediafiles-table: Assigns the file as the MediaFiles table.

corpus-title: The corpus title or long name. Except for your institution you should not use any
                abbreviations. You may use white spaces here.

bundles-table: Assigns the file as the Bundles table.

corpus-name: The short code name of the corpus (no spaces allowed).

sessions-table: Assigns the file as the Sessions table.

actors-table: Assigns the file as the Actors (e.g. Speakers, Signers, ...) table.

Output: An XML response containing the tags "success", "downloadLink", "output" and "warning".
                "success" states if the processing was successful or not, "downloadLink" specifies the location where
                the generated zip file (containing CMDI files) can be found, "output" contains the output that is
                mostly useful during debugging errors and "warning" contains any warnings, which occured during the
                processing.
----------------------------------------------------------------
----------------------------------------------------------------
runASR
------------------
Description: Automatic transcription of speech signal using several third party ASR services (experimental). 
            By using this service you indemnify and hold the BAS harmless from any claim arising out of the use 
            of these third party webservices. Note that ASR services support different sets of languages; if you select an 
            ASR service and a language code that are not compatible, an ERROR will be returned. Also note that ASR services 
            have different quota limitations; if the service returns a quota violation ERROR, you might consider trying 
            a different ASR service or contact the BAS for an extended user account. 
            This service is experimental and can be terminated 
            any time without warning. It is restricted for academic use only; therefore this service cannot be called as a RESTful
            service like other BAS services, and the Web API to this service is protected by AAI Shiboleth authentification.
            

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F SIGNAL=@<filename> -F LANGUAGE=deu-DE -F speakMatch= -F OUTFORMAT=bpf -F USEREMAIL= -F ASRType=autoSelect -F diarization=true -F numberSpeakDiar=0 -F USEWORDASTURN=false -F TROSpeakerID=false -F ACCESSCODE= 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runASR'

Parameters:  SIGNAL [LANGUAGE] [speakMatch] [OUTFORMAT] [USEREMAIL] [ASRType] [diarization] [numberSpeakDiar] [USEWORDASTURN] [TROSpeakerID] [ACCESSCODE]

Parameter description: 
SIGNAL: Input signal file that contains the spoken text to be transcribed.
                Accepted file formats are *.wav (WAVE RIFF), *.nis|nist|sph (NIST SPHERE), *.mpeg|mpg (Video, several codecs) and 
                *.mp4 (MPEG4) and all formats supported by the service 'AnnotConv'. File format will be determined by extension only.

LANGUAGE: [afr-ZA, sqi-AL, amh-ET, ara-DZ, ara-BH, ara-EG, ara-IQ, ara-IL, ara-JO, ara-KW, ara-LB, arb, ara, ara-MA, ara-OM, ara-QA, ara-SA, ara-PS, ara-TN, ara-AE, hye-AM, aze-AZ, eus-ES, ben-BD, ben-IN, bul-BG, mya-MM, cat-ES, yue-HK, cmn-CN, cmn-TW, hrv-HR, ces-CZ, dan-DK, nld-BE, nld-NL-GN, nld-NL-OH, nld-NL-PR, nld-NL, eng-AU, eng-CA, eng-GH, eng-GB, eng-IN, eng-IE, eng-KE, eng-NG, eng-NZ, eng-PH, eng-SC, eng-SG, eng-ZA, eng-TZ, eng-US, est-EE, fil-PH, fin-FI, fra-CA, fra-FR, glg-ES, kat-GE, deu-AT, deu-DE, deu-DE-OH, deu-CH, ell-GR, guj-IN, heb-IL, hin-IN, hun-HU, isl-IS, ind-ID, ita-IT, jpn-JP, jav-ID, kan-IN, khm-KH, kor-KR, lao-LA, lav-LV, lit-LT, mkd-MK, mal-IN, msa-MY, mar-IN, mon-MN, nep-NP, nob-NO, fas-IR, pol-PL, por-BR, por-PT, pan-guru-IN, ron-RO, rus-RU, srp-RS, sin-LK, slk-SK, slv-SI, spa-AR, spa-BO, spa-CL, spa-CO, spa-CR, spa-DO, spa-EC, spa-SV, spa-GT, spa-HN, spa-MX, spa-NI, spa-PA, spa-PY, spa-PE, spa-PR, spa-ES, spa-US, spa-UY, spa-VE, sun-ID, swa-KE, swa-TZ, swe-SE, tam-IN, tam-MY, tam-SG, tam-LK, tel-IN, tha-TH, tur-TR, ukr-UA, urd-IN, urd-PK, uzb-UZ, vie-VN, zul-ZA] 

                   Language of the speech to be recognized; we use the RFC5646 sub-structure 'iso639-3 - iso3166-1 [- iso3166-2],
                   e.g. 'eng-US' for American English, 'deu-AT-1' for Austrian German spoken in 'Oberoesterreich'.
                   Some ASR services distinguish not by region but by language model applied, i.e. the third part of the
                   RFC5646 sub-structure is not a (country-specific) region code but a propriatory code for a language domain,
                   e.g. 'nld-NL-OH' is the Dutch language model optimized for Oral History speech data.
                   Special languages:
                        Thai, Russian and Georgian expect their respective standard alphabets;
                        Japanese allows Kanji or Katakana or a mixture of both, but the tokenized output will contain only the
                        Katakana version of the input; 
                        Swiss German expects input to be transcribed in 'Dieth' (https://en.wikipedia.org/wiki/Swiss_German); 
                        Australian Aboriginal languages (including Kunwinjku, Yolnu Matha) expect so called 'Practical Orthography'
                        (https://en.wikipedia.org/wiki/Transcription_of_Australian_Aboriginal_languages); 
                        Persian accepts a romanized version of Farsi developped by Elisa Pellegrino and Hama Asadi (see
                        http://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/PersianRomanizationTable.pdf)
                        for details).
                

speakMatch: Option speakMatch: if set to a list of comma separated names (e.g.
		speakMatch='Anton,Berta,Charlie', the corresponding speaker labels found by the 
		service in the order of appearance are replaced by these names (e.g. 'S1' to 'Anton',
		'S2' to 'Berta' etc.). This allows the user to create SD annotation using her self
		defined speaker labels, if the user knows the order of appearance; it is obvious
		that this feature only makes sense in single file processing, since the speaker labels
		and the order of appearance differ from one recording to the next; the suggested mode
		of operation is to run the service in batch mode over all recordings with speakMatch="",
		then inspect manually the resulting annotation and define speaker labels in the order of 
		appearance for each recording, and then run the service in single file mode for each 
		recording again with the corresponding speakMatch list. If the speakMatch option contains 
		a comma separated list of value pairs like 'S1:Anton', only the speaker labels listed 
		on the lefthand side of each pair are patched, e.g. for speakMatch='S3:Charlie,S6:Florian'
		only the third and sixth appearing speaker are renamed to Charlie and Florian respectively.
                

OUTFORMAT: [bpf, exb, csv, TextGrid, emuDB, eaf, tei, txt, native] 
Format of result file: 
                'txt' : simple text file with one line of recognized text. 
                'bpf' : BAS Partitur Format with tiers ORT, TRO, SPK (if the service delivers) and WOR (if the service delivers). 
                'TextGrid' : praat compatible annotation file. 
                'emuDB' : EMU-SDMS annotation file *_annot.json with level ORT (ITEM), SPK and WOR (SEGMENT). 
                'native' : original JSON/XML response file of the service (if provided). 
                'csv' : CSV spread sheet table with flattened hierarchy entries, columns BEGIN and DURATION in sample counts. 
                'eaf' : ELAN compatible XML annotation file. 
                'exb' : EXMARaLDA compatible XML annotation file; 
                'tei' : Iso TEI document. 
                Note that some of these formats will cause an error, if the selected ASR service does not provide any segmental information (such as a word segmentation).
                For a description of BPF see
                http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html.
                for a description of Emu see https://github.com/IPS-LMU/emuR.
                Note 1: using 'emuDB' will first produce only single annotation file *_annot.json;
                in the WebMAUS interface (https://clarin.phonetik.uni-muenchen.de/BASWebServices) you can process
                more than one file and than download a zipped Emu database; in this case don't forget to change the default
                name of the emuDB 'MAUSOUTPUT' using the R function emuR::rename_emuDB().
                Note 2: if you need the same result in more than one format, select 'bpf' to produce a BPF file, and then
                convert this file with the service runAnnotConv ('AnnotConv') into the desired formats.
                Note 3: some format conversions are not loss-less; select 'bpf' to be sure that no information is lost.
                

USEREMAIL: Option USEREMAIL: if a valid email address is provided through this option, the 
                service will send the XML file containing the results of the service run to this address after completion.
                It is recommended to set this option for long recordings (batch size <6, length >1h) since it is often problematic to wait 
                for service completion over an instable internet connection or from a laptop that might go into hibernation.
                The email address provided is not stored on the server. It is sometimes even advisable to kill the browser tab
		after starting the call and wait for the result emails (only for batch size <6!).
                Beware: the download link to your result(s) will be valid for 24h after you receive the email; after that
                all your data will be purged from the server.
                Disclaimer: the usage of this option is at your own risk; the key URL to download your result file will
                be send without encryption in this email; be aware that anybody who can intercept this email will be able 
                to access your result files using this key; the BAS at LMU Munich will not be held responsible for any
                security breach caused by using this email notification option.
                

ASRType: [autoSelect, callAmberscriptASR, callEMLASR, callFraunhoferASR, callGoogleASR, callLSTDutchASR, callLSTEnglishASR, callWatsonASR, callUWEBASR, callWhisperXASR, allServices] 
Name of the ASR service applied. If set to 'autoSelect', the service will select the next available ASR service that supports the LANGUAGE; if set to 'allServices', the service will send the input signal to all ASR services that support LANGUAGE and output the ASR results in simple txt format. Please note that in some cases (see details in service manual) your input signal is send to a third party ASR service which is not a part of BAS. By selecting a third party service you accept the end user license agreement of this service (as posted on the Web API of BAS ASR service manual) and agree that your signals are to send to the selected service. Be advised that some of these services store input signals to improve their ASR performance, and that several restrictions (service dependent quotas) apply to the number and amount of input signals (see the ASR service manual on the BAS Web API for details). Some ASR services only allow asynchroneous processing, which means that the response time can be up to several minutes. If you need service capacity exceding the standard quotas for a specific ASR service, please contact the BAS for special arrangements.
                

diarization: [true, false] 
If set to true (default: false), the ASR service will label each word in the result with a speaker label (BPF tier SPK); speaker labels are 'S1', 'S2', etc. with ascending numbers in the order of appearance in the signal file. If the selected ASR service does not support speaker diarization, a WARNING is issued. Currently the service IBM Watson supports diarization for the languages eng-US, jpn-JP and spa-ES; the service EML supports diarization for all languages; the service Google Cloud supports diarization with presetting the number of speakers for about 20 out of 120 languages. See also option 'numberSpeakDIar'.

numberSpeakDiar: [0.0, 100.0] 
If set to a value greater 1, the speaker diarization is restricted to results with this number of speakers;
                this significantly improves results. Note that not all ASR services offer this option; see service manual for details. 
                If set to 0 or if the services does not offer this option, the service determines the number of speakers automatically.
                

USEWORDASTURN: [true, false] 
If set to true (default: false), and if the selected ASR service delivers a valid word segmentation (tier WOR), this word segmentation is encoded  as a chunk segmentation in the output (tier TRN) instead of the (possible) result of a speaker diarization (default). Both, the speaker diarization (which is basically a turn segmentation) and the word segmentation, when used as a chunk segmentation input to MAUS, might improve the phonetic alignment of MAUS, since they act as fix time anchors for the MAUS segmentation process. In some cases the word segmentation as time anchors yields better results (simply because there are more of them and a gross misalignment of MAUS is less likely); sometimes the chosen ASR service does not deliver a speaker diarization, then this option allows to switch to the word segmentation (which is delivered by all ASR services). 
                

TROSpeakerID: [true, false] 
If set to true (default: false), and if the selected ASR service delivers a valid speaker diarization (tier SPK) and a TRO tier, the service will insert speaker ID labels 'XXX: ' before each word in the TRO tier, that starts a new speaker turn of speaker labelled as 'XXX'. The inserted speaker label 'XXX' is either one of the standardized labels 'S1', 'S2', ... or mapped speaker labels taken from the option 'speakMatch'. The service also checks each preceeding word to a speaker turn change (the last word of the previous turn) and adds a trailing '.', if the word does not has already a trailing final punctuation sign (one of '!?.:...). This option enables pipelines that start with 'ASR' and end with 'SUBTITLE' to create subtitle tracks (e.g. WebVTT) that show the speaker ID at speaker changes, and that start a new subtitle at each speaker turn change.
                

ACCESSCODE: 
                  Exceed quota code (ACCESSCODE): special code a user has acquired to override default quotas. 
                  Not needed for normal operation. 
                

Output: A XML response containing the elements "success", "downloadLink", "output"
                and "warning". "success" states if the processing was successful or not, "downloadLink" specifies the
                location where the file containing the resulting transcription (segmented in words) can be
                found (the format of the file depends on the option selected in parameter OUTFORMAT), "output" contains the output that
                is mostly useful during debugging errors, and "warnings" contains any warnings that occured during the processing.
                
----------------------------------------------------------------
----------------------------------------------------------------
runSubtitle
------------------
Description: This service maps the result of a MAUS process (a word/phone segmentation) 
            or the result of an ASR (word segmentation) to the original transcript and groups the transcript into subtitles. 
            The service can be used to automatically create a subtitle track from a signal (+ text); 
            it is recommended to use the service Pipeline with parameter 
            PIPE=G2P_(CHUNKER)_MAUS_SUBTITLE (with text input) or PIPE=ASR_SUBTITLE (without text input). Alternatively, this service 
            can be used to map a transcript in arbitrary format (e.g. containing non-normalized words or 
            punctuations) to a MAUS segmentation. The latter is useful, if the word normalisation/tokenisation 
            changes the original transcript for the MAUS segmentation, but you need for instance punctuations 
            for your analysis of the MAUS segmentation. If the service reads the original transcript, this 
            transcript file is piped through the service runTextEnhance (webinterface: TextEnhance) before aligning it
            to the result of runMAUS; this ensures that results from a Pipeline run (in which runTextEnhance 
            is always applied to the text input) can be processed using the same text input format (e.g. RTF).
            

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F transcription=@<filename> -F maxlength=0 -F marker=punct -F bpf=@<filename> -F outformat=bpf+trn 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runSubtitle'

Parameters:  [transcription] [maxlength] [marker] bpf [outformat]

Parameter description: 
transcription: (Non-normalized) transcription of the recording to be segmented into 
                        subtitles (usually this file is the input of the earlier Pipeline G2P_... process). Format 
                        is all formats that can be converted by service TextEnhance; content is non-normalized text. For example, 
                        the transcript could contain numerals or abbreviations or punctuations that 
                        are all retained in the subtitles, while the output of a runMAUS process 
                        contains only the normalized text stripped from punctuations.
                        Note that this input can be ommitted; then the subtitles are derived from the TRO tier
                        of the BPF input, or - if TRO does not exist - from the ORT tier.

maxlength: [0.0, 999.0] 
Maximum subtitle length. If set to 0,
                        subtitles of indefinite length are created, based only
                        on the distance of the split markers. If set to a
                        value greater than 0, subtitles are split whenever a
                        stretch between two neighbouring split markers is
                        longer than that value (in words). Caution: This may
                        lead to subtitle splits in suboptimal locations (e.g.
                        inside syntactic phrases).

left-bracket: 
                        One or more characters which mark comments reaching
                        until the end of the line (default: NONE). E.g. if your 
                        input transcript contains comment lines that begin with ';', 
                        set this option to ';' to avoid that these comments are
                        treated as spoken text. If you want to suppress the masking of 
                        comment lines, set this option to 'NONE' (default). 
                        If you are using comment lines in your input text, you must be absolutely 
                        sure that the comment character appears nowhere in the text except in comment lines!
                        Note 1: the characters '&', '|' and '=' do not work as comment characters. 
                        Note 2: for technical reasons the value for this option cannot be empty.
			Note 3: the default character '#' cannot be combined with other characters, e.g. if 
			you define this option as ';#', the '#' will be ignored.
                        Note 4 (sorry): for the service 'Subtitle' comment lines must be terminated with
                        a so called 'final punctuation sign', i.e. one of '.!?:…'; otherwise, an immediately
                        following speaker marker will not be recognized.
                

marker: [punct, newline, tag] 
Marker used to split transcription into subtitles. If
                        set to 'punct' (default), the transcription is split
                        after 'terminal' punctuation marks (currently [.!?:…]. If set
                        to 'newline', the transcription is split at newlines (\n
                        or \r\n). If set to 'tag', the program expects a special
                        &lt;BREAK&gt; tag inside the transcription.
                

bpf: Phonemic transcription of the recording to be mapped to subtitles 
                        (usually this file is the output of a runMAUS or a runASR process). Format is a BAS Partitur 
                        Format (BPF) file with at least an ORT and a MAU or WOR tier. The ORT tier contains a 
                        table with 3 columns and one line per (tokenized) word in the video. Column 1 is 
                        always 'ORT:'; column 2 is an integer starting with 0 denoting the word position 
                        within the input; column 3 contains the (normalized) orthography of the word coded 
                        in UTF-8. The MAUi/WOR tier is a table with 5 columns containing the segmentation and 
                        labeling of phones/words: column 1 is always 'MAU:/WOR:'; column 2 is the begin of a segment 
                        in samples from the start of the recording (0); column 3 contains the duration 
                        of the segment in samples minus 1; column 4 contains the word number to which this 
                        segment belongs (see tier ORT); column 5 contains the SAMPA/IPA encoding of the 
                        phone or the word label. If the input BPF contains a TRO (tokenized original text) tier,
                        the input of the original transcript can be ommitted. 
                        See http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html for a 
                        detailed description of the BPF.

replace-whitespace-char: 
                        The character that whitespace in comments and annotation markers should be
                        substituted by (default: NONE). The BAS WebServices require 
                        that annotation markers or comment lines in the input transcript do not 
                        contain white spaces. This option let you decide which character
                        should be used to replace the white spaces; the most common character used for this 
                        purpose is '_' (this is the default in Pipelines).
                        If set to the string 'NONE' no replacement takes place (default).
                        CAUTION: the characters '&' and '=' do not work as replacements.
                

outformat: [srt, sub, vtt, bpf+trn] 
Output format. 'srt', 'sub' or 'vtt' denote 'SubRip', 'SubViewer' or 
                        'WebVTT' subtitle format respectively. 'bpf+trn' (default) denotes a BAS Partitur Format
                        file (BPF, *.par, copied from input) with an added TRO tier that maps the 
                        original transcript to the tokenized and word-normalized ORT tier, and an 
                        added TRN tier (existing TRN are over-written!) that corresponds to subtitles. 
                        If output format is 'vtt' and a subtitle starts with a speaker marker of the form 
                        '<...>', a 'v ' is inserted before the '...'.
                        See http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html
                        for details about the BPF format.
                

brackets: 
                        One or more pairs of characters which bracket
                        annotation markers in the input transcript. E.g. if your input 
                        transcript contains markers like '{Lachen}' and '[noise]' that should be passed as
                        markers and not as spoken text, set this option to '{}[]'.
                        Note that blanks replacement within such markers (see next option 'replace-whitespace-char')
                        only takes place in markern/comments that are defined here.
                

Output: An XML response containing the tags "success", "downloadLink", "output" and 
                        "warning". "success" states whether the processing was successful or not, "downloadLink" 
                        specifies the location where the output file is provided; depending on parameter 
                        'outformat' this can be BPF file (*.par), a SubRip subtitle format (*.srt), or a SubViewer 
                        subtitle format (*.sub). The BPF contains the content of the input BPF (option "bpf") with 
                        appended TRO and TRN tier (existing TRO/TRN tiers in the BPF input are over-written). 
                        The TRO tier contains the mapping from the ORT tier to the input transcription; the TRN tier 
                        contains the subtitle grouping.
                
----------------------------------------------------------------
----------------------------------------------------------------
runSpeakDiar
------------------
Description: This services reads a media file (sound, video) and performs a speaker diarization (SD) based on the pyannote 2 python library.
		    Website: https://github.com/pyannote; Paper: https://dx.doi.org/10.1109/ICASSP40776.2020.9052974. 
		    The service is a wrapper around the pyannote2.0 library [1][2], which has proven to be the most reliable open source diarization model at the time of testing (2023).
		    The library applies pretrained models for voicing segmentation (trained on dihard3: https://dihardchallenge.github.io/dihard3/index) and overlap detection to pre-segment the data.
		    The voiced segments are subsequently converted into speaker-identifying embeddings using the public ECAPA-TDNN architecture from speechbrain [3] (trained on voxceleb 1+2: https://www.robots.ox.ac.uk/~vgg/data/).
		    These embeddings are then clustered using Hierarchical agglomerative clustering to find speaker segments likely belonging to the same person.
		    Finally the embeddings are mapped back into the time domain which yields a final diarization output.
		    More details can be found in the pyannote pipeline’s technical report here: https://huggingface.co/pyannote/speaker-diarization
            

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F SIGNAL=@<filename> -F speakMatch= -F speakNumber=0 -F OUTFORMAT=bpf -F SAMPLERATE=1 -F TEXT=@<filename> -F minSpeakNumber=0 -F maxSpeakNumber=0 -F allowOverlaps=false 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runSpeakDiar'

Parameters:  SIGNAL [speakMatch] [speakNumber] [OUTFORMAT] [SAMPLERATE] [TEXT] [minSpeakNumber] [maxSpeakNumber] [allowOverlaps]

Parameter description: 
SIGNAL: Required input SIGNAL: sound or video file containing the speech signal to 
                be speaker diarized. Although the mimetype of this input file is restricted to RIFF AUDIO 
		audio/x-wav (extension wav), all media formats that are supported by BAS WebService 
		AudioEnhance are accepted.

speakMatch: Option speakMatch: if set to a list of comma separated names (e.g.
		speakMatch='Anton,Berta,Charlie', the corresponding speaker labels found by the 
		service in the order of appearance are replaced by these names (e.g. 'S1' to 'Anton',
		'S2' to 'Berta' etc.). This allows the user to create SD annotation using her self
		defined speaker labels, if the user knows the order of appearance; it is obvious
		that this feature only makes sense in single file processing, since the speaker labels
		and the order of appearance differ from one recording to the next; the suggested mode
		of operation is to run the service in batch mode over all recordings with speakMatch="",
		then inspect manually the resulting annotation and define speaker labels in the order of 
		appearance for each recording, and then run the service in single file mode for each 
		recording again with the corresponding speakMatch list. If the speakMatch option contains 
		a comma separated list of value pairs like 'S1:Anton', only the speaker labels listed 
		on the lefthand side of each pair are patched, e.g. for speakMatch='S3:Charlie,S6:Florian'
		only the third and sixth appearing speaker are renamed to Charlie and Florian respectively.
                

speakNumber: [0.0, 999999.0] 
Option speakNumber restricts the number of detected speakers to the 
		given number. If set to 0 (default), the SD method determines the number automatically.
	        

OUTFORMAT: [bpf, exb, csv, TextGrid, emuDB, eaf, tei] 
Option 'Output format' (OUTFORMAT): Defines the possible output formats: TextGrid - a praat compatible
                TextGrid file; bpf - a (input) BPF file with new (or replaced) tier(s) SPD (and SPK if BPF was input); csv - a spreadsheet
                (CSV table) that contains the most important information; 
                emuDB - an Emu compatible *_annot.json file;
                eaf - an ELAN compatible annotation file; exb - an EXMARaLDA compatible annotation file;
                tei - Iso TEI document (XML).
                For a description of BPF see
                http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html.
                for a description of Emu see https://github.com/IPS-LMU/emuR.
                Note 1: using 'emuDB' will first produce only single annotation file *_annot.json;
                in the WebMAUS interface (https://clarin.phonetik.uni-muenchen.de/BASWebServices) you can process
                more than one file and than download a zipped Emu database; in this case don't forget to change the default
                name of the emuDB 'MAUSOUTPUT' using the R function emuR::rename_emuDB().
                Note 2: if you need the same result in more than one format, select 'bpf' to produce a BPF file, and then
                convert this file with the service runAnnotConv ('AnnotConv') into the desired formats.
                Note 3: some format conversions are not loss-less; select 'bpf' to be sure that no information is lost.
                

SAMPLERATE: [0.0, 999999.0] 
Option SAMPLERATE of signal file: if the sample rate cannot be determined 
                automatically from SIGNAL, you can provide the 
		sampling rate via this option. Usually you can leave it to the default value of 1.
	        

TEXT: Optional BPF input: BAS Partitur Format (BPF) file (*.par or *.bpf) to which
		the SD result is appended to and copied to output (possibly converted to another format). 
		If the BPF contains a word segmentation (tier ORT/MAU), the service matches the SD result
		against the word segmentation and creates a word-wise SD labelling (SPK tier) based on 
		maximum overlap. 
		See http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html for detailed 
		description of the BPF.

minSpeakNumber: [0.0, 999999.0] 
Option minSpeakNumber defines a hard lower bound of the number of detected speakers. If set to 0 (default), no lower bound.
	        

maxSpeakNumber: [0.0, 999999.0] 
Option maxSpeakNumber defines a hard upper bound of the number of detected speakers. If set to 0 (default), no upper bound.
	        

allowOverlaps: [true, false] 
Option allowOverlaps: If set to true, the un-altered output of PyAnnote is
                returned in the SPD tier (note that overlaps cannot be handled by most annotation formats; only use if you really need
                to detect overlaps!); if set to false (default), overlaps, missing silence intervals etc. are resolved
                in the output tier SPD, making this output compatible with all annotation formats. The postprocessing works as follows:
                1. all silence intervals are removed. 2. all speaker segments that are 100% within another (larger) speaker segment are 
                removed. 3. If an overlap occurs the earlier segment(s) are truncated to the start of the new segment. 4. all remaining 
                gaps in the segmentation are filled with silence intervals.
                

Output: A XML response containing the elements "success", "downloadLink", "output" and "warning".
                "success" states if the processing was successful or not, "downloadLink" specifies the location where
                the result file can be found which contains the speaker diarization result.
                The format of the annotation file depends on the option selected in 
                OUTFORMAT. "output" contains the output that is mostly useful during debugging errors and "warning" 
                lists warnings, if any occured during the processing. 
                
----------------------------------------------------------------
----------------------------------------------------------------
runMAUSGetInventar
------------------
Description: Returns the available phonemic input inventar (in SAMPA) for a given language.

Example curl call is:
curl -v -X GET -H 'content-type: application/x-www-form-urlencoded' 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runMAUSGetInventar?LANGUAGE=deu-DE'

Parameters:  [LANGUAGE]

Parameter description: 
LANGUAGE: [aus-AU, afr-ZA, sqi-AL, eus-ES, eus-FR, cat-ES, nld-BE, nld-NL, nor-NO, eng-US, eng-AU, eng-GB, eng-SC, eng-NZ, ekk-EE, fin-FI, fra-FR, kat-GE, deu-DE, gsw-CH, gsw-CH-BE, gsw-CH-BS, gsw-CH-GR, gsw-CH-SG, gsw-CH-ZH, hun-HU, isl-IS, ita-IT, jpn-JP, gup-AU, sampa, ltz-LU, mlt-MT, nan-TW, fas-IR, pol-PL, por-PT, ron-RO, rus-RU, spa-ES, swe-SE, tha-TH, guf-AU] 
Language of the phoneme symbol set; we use the RFC5646 sub-structure 'iso639-3 - iso3166-1
                [ - iso3166-2], e.g. 'eng-US' for American English, 'deu-AT-1' for Austrian German spoken in
                'Oberoesterreich'.

Output: List of accepted input phonemic SAM-PA symbols for the selected language; one symbol per
                line; this can be used by calling applications to pre-test the transcription input to the runMAUS
                service for faulty symbols.
----------------------------------------------------------------
----------------------------------------------------------------
runMAUS
------------------
Description: Segments a media file into phonetic and word segments given a tokenized phonemic
            transcription as input. This service allows the
            usage of all possible options of the MAUS program. 
            The service creates a stochastic, language specific pronunciation model derived from the canonical input transcript
            and then combines this model with a phonetic Hidden Markov Model trained on the language to decode the most likely segmentation and 
            labelling.
            See the section Input for a detailed description of all options or use the operation 'runMAUSGetHelp' to 
            download a current version of the MAUS documentation.
            Note that this service does not process text files (*.txt) as an input, but rather 
            BAS Partitur Format (BPF, *.par, see https://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html#Partitur for details) 
            or CSV tables (*.csv). To process text input use either the service runMAUSBasic, or - in case you require 
            some options that are only available for runMAUS - use the operation 'Pipeline' with the PIPE=G2P_MAUS.
            

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F SIGNAL=@<filename> -F LANGUAGE=deu-DE -F MODUS=default -F INSKANTEXTGRID=true -F RELAXMINDUR=false -F OUTFORMAT=TextGrid -F TARGETRATE=100000 -F ENDWORD=999999 -F RELAXMINDURTHREE=false -F STARTWORD=0 -F INSYMBOL=sampa -F PRESEG=false -F USETRN=false -F BPF=@<filename> -F MAUSSHIFT=default -F INSPROB=0.0 -F INSORTTEXTGRID=true -F OUTSYMBOL=sampa -F RULESET=@<filename> -F MINPAUSLEN=5 -F WEIGHT=default -F NOINITIALFINALSILENCE=false -F ADDSEGPROB=false 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runMAUS'

Parameters:  SIGNAL [LANGUAGE] [MODUS] [INSKANTEXTGRID] [RELAXMINDUR] [OUTFORMAT] [TARGETRATE] [ENDWORD] [RELAXMINDURTHREE] [STARTWORD] [INSYMBOL] [PRESEG] [USETRN] BPF [MAUSSHIFT] [INSPROB] [INSORTTEXTGRID] [OUTSYMBOL] [RULESET] [MINPAUSLEN] [WEIGHT] [NOINITIALFINALSILENCE] [ADDSEGPROB]

Parameter description: 
SIGNAL: media file containing the speech signal to be segmented; any
                sampling rate; optimal results if leading and trailing silence intervals are truncated before
                processing. Although the mimetype of this input file is restricted to
                audio/x-wav (wav|WAV), the service will also process *.nis|nist|sph (NIST SPHERE), *.al|dea (ALAW),
                *.mpeg|mpg (Video, several codecs) and *.mp4 (MPEG4). File format will be determined by extension only.
                

LANGUAGE: [aus-AU, afr-ZA, sqi-AL, eus-ES, eus-FR, cat-ES, nld-BE, nld-NL, eng-US, eng-AU, eng-GB, eng-SC, eng-NZ, ekk-EE, fin-FI, fra-FR, kat-GE, deu-DE, gsw-CH, gsw-CH-BE, gsw-CH-BS, gsw-CH-GR, gsw-CH-SG, gsw-CH-ZH, hun-HU, isl-IS, ita-IT, jpn-JP, sampa, ltz-LU, mlt-MT, nan-TW, nor-NO, fas-IR, pol-PL, por-PT, ron-RO, rus-RU, spa-ES, swe-SE, tha-TH] 
Option Language (LANGUAGE): Language of the speech to be processed; defines the possible phoneme
                symbol set in MAUS input and the pronunciation modelling module. RFC5646 sub-structure 'iso639-3 -
                iso3166-1 [- iso3166-2], e.g. 'eng-US' for American English, 'deu-AT-1' for Austrian German spoken in
                'Oberoesterreich'. The language code 'sampa' (not RCFC5646) or 'und' denotes a language independent
                variant of MAUS for which the SAM-PA symbols in the input BPF must be blank separated (e.g. /h OY t @/).
                

MODUS: [default, standard, align] 
Option MODUS: Operation modus of MAUS: default is to use the language dependent default modus; the 
                two possible modi are: 'standard' which is the segmentation and
                labelling using the MAUS technique as described in Schiel ICPhS 1999, and 'align', a forced
                alignment is performed on the input SAM-PA string defined in the KAN tier of the BPF (the same effect
                as the deprecated former option CANONLY=true).

INSKANTEXTGRID: [true, false] 
OPTION KAN tier in TextGrid (INSKANTEXTGRID): Switch to create an additional tier in the TextGrid output file
                with a word segmentation labelled with the canonic phonemic transcript (taken from the input KAN
                tier).

RELAXMINDUR: [true, false] 
Option Relax Min Duration (RELAXMINDUR) changes the default minimum duration of 3 statesfor consonants 
                and short/lax vowels and of 4 states for tense/long vowels and diphthongs to 1 and 2 states respectively. 
                This is not optimal for general segmentation because MAUS will start to insert many very short 
                vowels/glottal stops where they are not appropriate. But for some special investigations 
                (e.g. the duration of /t/) it alleviates the ceiling problem at 30msec duration (with standard frame rate of 10msec per state).
                

OUTFORMAT: [bpf, exb, csv, TextGrid, emuDB, eaf, tei, mau, par] 
Option 'Output format' (OUTFORMAT): Defines the possible output formats: TextGrid - a praat compatible
                TextGrid file; bpf - the input BPF file with a new (or replaced) tier MAU; csv - a spreadsheet
                (CSV table) that contains word and phone segmentation; mau - just the BPF tier MAU (phonetic segmentation); 
                emuDB - an Emu compatible *_annot.json file; 
                eaf - an ELAN compatible annotation file; exb - an EXMARaLDA compatible annotation file; 
                tei - Iso TEI document (XML).
                For a description of BPF see
                http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html.
                for a description of Emu see https://github.com/IPS-LMU/emuR.
                Note 1: using 'emuDB' will first produce only single annotation file *_annot.json;
                in the WebMAUS interface (https://clarin.phonetik.uni-muenchen.de/BASWebServices) you can process 
                more than one file and than download a zipped Emu database; in this case don't forget to change the default
                name of the emuDB 'MAUSOUTPUT' using the R function emuR::rename_emuDB().
                Note 2: if you need the same result in more than one format, select 'bpf' to produce a BPF file, and then 
                convert this file with the service runAnnotConv ('AnnotConv') into the desired formats. 
                Note 3: some format conversions are not loss-less; select 'bpf' to be sure that no information is lost.
                

TARGETRATE: [100000, 20000, 10000] 
Option Output frame rate (TARGETRATE): the resolution of segment boundaries in output measured in 100nsec units (default 100000 = 10msec). Decreasing this value (min is 10000) increases computation time, does not increase segmental accuracy in average, but allows output segment boundaries to assume more possible values (default segment boundaries are quantizised in 10msec steps). This is useful, if MAUS results are analysed for duration of phones or syllables.
                

ENDWORD: [0.0, 999999.0] 
Option End with word (ENDWORD): If set to a value n<999999, this option causes maus to end the
                segmentation with the word number n (word numbering in BPF starts with 0). This is useful if the input
                signal file is just a segment within a longer transcript. See also option STARTWORD.

RELAXMINDURTHREE: [true, false] 
Alternative option to Relax Min Duration (RELAXMINDUR): changes the minimum duration for all models to 3 states 
                (= 30msec with standard frame rate)to 30msec.
                This can be useful when comparing the duration of different phone groups.
                

STARTWORD: [0.0, 999999.0] 
Option Start with word (STARTWORD): If set to a value n>0, this option causes maus to start the
                segmentation with the word number n (word numbering in BPF starts with 0). This is useful if the input
                signal file is just a segment within a longer transcript. See also option ENDWORD.

INFORMAT: Deprecated option INFORMAT: Input format is now detected from input file extension.
                Defines the possible input formats: bpf - a BPF file with (minimum) tier
                KAN; bpf-sampa - BPF file with KAN tier with blank separated SAM-PA symbols, switches to language
                independent SAM-PA mode processing; for a description of BPF see
                http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html

INSYMBOL: [sampa, ipa] 
Option Input Encoding (INSYMBOL): Defines the encoding of phonetic symbols in input. If set to 'sampa'
                (default), phonetic symbols are encoded in X-SAMPA (with some coding differences in Norwegian/Icelandic);
                use service runMAUSGetInventar with option LANGUAGE=sampa to get a list of symbols and
                their mapping to IPA). If set to 'ipa', the service expects blank-separated UTF-8 IPA. 
                

PRESEG: [true, false] 
Option Pre-segmentation (PRESEG): If set to true, a pre-segmentation using the wav2trn tool is done by the
                webservice on-the-fly; this is useful, if the input
                signal has leading and/or trailing silence. If this option is set in combination with USETRN=true
                and the input BPF contains a chunk segmentation (tier TRN), then the        
                presegmentation is carried out for every single chunk.

USETRN: [true, false, force] 
Option Chunk segmentation (USETRN): If set to true, the service searches the input BPF for a TRN tier
                (turn/chunk segmentation, see http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatsdeu.html#TRN). The
                synopsis for a TRN entry is: 'TRN: (start-sample) (duration-sample) (word-link-list) (label)', e.g.
                'TRN: 23654 56432 0,1,2,3,4,5,6 sentence1' (the speech within the recording 'sentence1' starts with
                sample 23654, last for 56432 samples and covers the words 0-6). If only one TRN entry is found, the
                segmentation is restricted within a time range given by this TRN tier entry; this is useful, if there
                exists a reliable pre-segmentation of the recorded utterance, i.e. the start and end of speech within
                the recording is known. If more than one TRN entry is found, the webservice performs an segmentation
                for each 'chunk' defined by a TRN entry and aggregates all individual results into a single results
                file; this is useful if the input consists of long recordings, for which a manual chunk segmentation is
                available. If USETRN is set to 'force' (deprecacted since maus 4.11; use PRESEG=true instead!), 
                a pre-segmentation using the wav2trn tool is done by the
                webservice on-the-fly; this is useful, if the input BPF does not contain a TRN entry and the input
                signal has leading and/or trailing silence.

BPF: Phonemic transcription of the utterance to be segmented. Format is either 
                a BAS Partitur Format (BPF, *.par) file with a KAN tier or a spreadsheet CSV file. The KAN tier contains 
                a table with 3 columns and one line per word in the input. Column 1 is always 'KAN:'; column 2 
                is an integer starting with 0 denoting the word position (tokenization) within the input; column 3 contains the 
                canonical pronunciation of the word coded in SAM-PA (or IPA). The *.csv file contains two columns separated 
                by ';', one word in each line, the UTF-8 encoded orthography in the 1st, the canonical 
                pronunciation in the 2nd colum (SAMPA or IPA).
                Note that the pronunciation string must contain phoneme-separating blanks for the language independent mode 
                (LANGUAGE = 'sampa' or 'und'), e.g /h OY t @/x); for languages that are official SAMPA these are optional (e.g. /hOYt@/ is possible). See
                http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html for detailed description of the
                BPF.

MAUSSHIFT: Option Segment shift (MAUSSHIFT): If set to n, this option causes the calculated MAUS segment boundaries
                to be shifted by n msec (default: 0) into the future. Most likely this systematic shift is caused by a
                boundary bias in the training material's segmentation.
                The default should work for most cases.

INSPROB: Option Phon insertion prob (INSPROB): The option INSPROB influences the probability of deletion of segments. It
                is a constant factor (a constant value added to the log likelihood score) after each segment.
                Therefore, a higher value of INSPROB will cause the probability of segmentations with more segments go
                up, thus decreasing the probability of deletions (and increasing the probability of insertions, which
                are rarely modelled in the rule sets). This parameter has been evaluated on parts of the German
                Verbmobil data set (27425 segments) which were segmented and labelled manually (MAUS DEV set) and found
                to have its optimum at 0.0 (which is nice). Therefore we set the default value of INSPROB to 0.0.
                INSPROB was also tested against the MAUS TEST set to confirm the value of 0.0. It had an optimum at 0.0
                as well. Note that this might NOT be the optimal value for other MAUS tasks.

INSORTTEXTGRID: [true, false] 
Option ORT tier in TextGrid (INSORTTEXTGRID): Switch to create an additional tier ORT in the TextGrid output file
                with a word segmentation labelled with the orthographic transcript (taken from the input ORT tier);
                this option is only effective, if the input BPF contains an additional ORT tier.

OUTSYMBOL: [sampa, ipa, manner, place] 
Option Output Encoding (OUTSYMBOL): Defines the encoding of phonetic symbols in output. If set to 'sampa'
                (default), phonetic symbols in output are encoded in X-SAMPA (with some minor differences in languages
                Norwegian/Icelandic in which the retroflex consonants are encoded as 'rX' instead of X-SAMPA 'X_r');
                use service runMAUSGetInventar with option LANGUAGE=sampa to get a list of symbols and
                their mapping to IPA. If set to 'ipa', the service produces UTF-8 IPA output. If set to 'manner', the
                service produces IPA manner of articulation for each segment; possible values are: silence, vowel,
                diphthong, plosive, nasal, fricative, affricate, approximant, lateral-approximant, ejective. If set to
                'place', the service produces IPA place of articulation for each segment; possible values are: silence,
                labial, dental, alveolar, post-alveolar, palatal, velar, uvular, glottal, front, central,
                back.

RULESET: MAUS rule set file; UTF-8 encoded; one rule per line; there are two different file types defined by the extension: 1. Phonological rule set without statistical information '*.nrul', synopsis is: 'leftContext-match-rightContext>leftContext-replacement-rightContext', e.g. 't,s-e:-n>t,s-@-n'. 2. Rule set with statistical information '*.rul', synopsis is: 'leftContext,match,rightContext>leftContext,replacement,rightContext ln(P(replacement|match) 0.0000', e.g. 'P9,n,@,n,#>P9,# -3.761200 0.000000'; 'P(replacement|match)' is the conditional probability that 'match' is being replaced by 'replacement'; the sum over all conditional probabilities with the same condition 'match' must be less than 1; the difference between the sum and 1 is the conditional probability 'P(match|match)', i.e. no for no change. 'leftContext/rightContext/match/replacememt' = comma separated lists of SAMPA symbols or empty lists (for *.rul the leftContext/rightContext must be exactly one symbol!); special SAMPA symbols in contexts are: '#' = word boundary between words, and '<' = utterance begin (may be used instead of a phonemic symbol); digits in SAMPA symbols must be preceded by 'P' (e.g. '2:' -> 'P2:'); all used SAMPA symbols must be defined in the language specific SAMPA set (see service runMAUSGetInventar). Examples for '*.rul' : 'P9,n,@,n,#>P9,# = 'the word final syllable /n@n/ is deleted, if preceded by /9/', '#,k,u:>#,g,u:' = 'word intial /k/ is replaced by /g/ if followed by the vowel /u:/'. Examples for '*.nrul' : '-->-N,k-' = 'insert /Nk/ at arbitrary positions', '#-?,E,s-#>#-s-#' = 'delete /?E/ in word /?Es/', 'aI-C-s,t,#>aI-k-s,t,#' = 'replace /C/ in word final syllable /aICst/ by /k/'.

MINPAUSLEN: [1.0, 999.0] 
Option Inter-word silence (MINPAUSLEN): Controls the behaviour of optional inter-word silence. If set to 1,
                maus will detect all inter-word silence intervals that can be found (minimum length for a silence
                interval is then 10 msec = 1 frame). If set to values n>1, the minimum length for an inter-word
                silence interval to be detected is set to n*10 msec. For example MINPAUSLEN of 5 will cause MAUS to
                suppress inter-word silence intervals up to a length of 40msec. Since 40 msec seems to be the border of
                perceivable silence, we set this option default to 5. With other words: inter-word silences smaller
                than 50msec are not segmented but rather distributed equally to the adjacent segments. If one of the
                adjacent segments happens to be a plosive then the deleted silence interval is added totally to the
                plosive; if both adjacent segments are plosives, the interval is equally spread as with non-plosive
                adjacent segments.

WEIGHT: The option Pron model weight (WEIGHT) weights the influence of the statistical pronunciation model against the
                acoustical scores. More precisely WEIGHT is multiplied to the pronunciation model score (log
                likelihood) before adding the score to the acoustical score within the search. Since the pronunciation
                model in most cases favors the canonical pronunciation, increasing WEIGHT will at some point cause MAUS
                to choose always the canonical pronunciation; lower values of WEIGHT will favor less probable paths be
                selected according to acoustic evidence. If the acoustic quality of the signal is very good and the
                HMMs of the language are well trained, it makes sense to lower WEIGHT. For most languages this option
                is default to 1.0. In an evaluation on parts of the German Verbmobil data set (27425 segments) which
                were segmented and labelled manually (MAUS DEV set) WEIGHT was optimized to 7.0. Note that this might
                NOT be the optimal value for other languages. For instance Italian shows best results with WEIGHT=1.0,
                Estonian with WEIGHT=2.5. If set to default, a language specific optimal value is chosen automatically.

NOINITIALFINALSILENCE: [true, false] 
Option No silence model (NOINITIALFINALSILENCE): 
                Switch to suppress the automatic modeling of an optional leading/trailing silence interval. This is
                useful, if for instance the signal is known to start with a stop and no leading silence, and the silence model would 
                'capture' the silence interval from the plosive.

ADDSEGPROB: [true, false] 
Option Add Viterbi likelihoods (ADDSEGPROB) causes that the frame-normalized natural-log total 
                Viterbi likelihood of an aligned segment is appended to the segment label in the 
                output annotation (the MAU tier). This might be used as a 'quasi quality measure' 
                on how good the acoustic signal in the aligned segment has been modelled by the 
                combined acoustical and pronunciation model of MAUS. Note that the values are not 
                probabilities but likelihood densities, and therefore are not comparable for 
                different signal segments; they are, however, comparable for the same signal segment. 
                Warning: this option breaks the BPF standard for the MAU tier and must not be 
                used, if the resulting MAU tier should be further processed, e.g. in a pipe).
                Implemented only for output phoneme symbol set SAMPA (default).
		

Output: A XML response containing the tags "success", "downloadLink", "output" and "warning.
                success states if the processing was successful or not, downloadLink specifies the location where
                the result file can be found (the format of the file depends on the option selected in OUTFORMAT),
                output contains the output that is mostly useful during debugging errors and warnings if any warnings
                occured during the processing. 
                
----------------------------------------------------------------
----------------------------------------------------------------
runMAUSBasic
------------------
Description: Segments a media file into phonetic and word segments given the orthographic transcription as input (text file).
            The result is stored in a three-layer annotation file (word segmentation with orthographic labels, word segmentation with
            canonical pronunciation labels in SAM-PA, phonemic segmentation with SAM-PA labels).
            This is a simple version of a G2P_MAUS pipeline service which applies only default options; see operation
            'runMAUS' ('WebMAUS General' service) for the full MAUS service with all options or the operation 'runPipeline' ('Pipeline' service).
            

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F SIGNAL=@<filename> -F LANGUAGE=deu-DE -F OUTFORMAT=TextGrid -F TEXT=@<filename> 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runMAUSBasic'

Parameters:  SIGNAL [LANGUAGE] [OUTFORMAT] TEXT

Parameter description: 
SIGNAL: file containing the speech signal to be segmented; PCM 16 bit resolution; mono; any
                sampling rate; optimal results if leading and trailing silence intervals are truncated before
                processing. Although the mimetype of this input file is restricted to
                audio/x-wav (wav|WAV), the service will also process *.nis|nist|sph (NIST SPHERE), *.al|dea (ALAW),
                *.mpeg|mpg (Video, several codecs) and *.mp4 (MPEG4). File format will be determined by extension only.
                

LANGUAGE: [aus-AU, afr-ZA, sqi-AL, eus-ES, eus-FR, cat-ES, nld-BE, nld-NL, eng-AU, eng-US, eng-GB, eng-SC, eng-NZ, ekk-EE, fin-FI, fra-FR, kat-GE, deu-AT, deu-CH, deu-DE, gsw-CH, gsw-CH-BE, gsw-CH-BS, gsw-CH-GR, gsw-CH-SG, gsw-CH-ZH, hun-HU, isl-IS, ita-IT, jpn-JP, gup-AU, ltz-LU, mlt-MT, nor-NO, fas-IR, pol-PL, ron-RO, rus-RU, spa-ES, swe-SE, tha-TH, guf-AU] 
Language of the speech to be processed; we use the RFC5646 sub-structure 'iso639-3 -
                iso3166-1 [ - iso3166-2], e.g. 'eng-US' for American English, 'deu-AT-1' for Austrian German spoken in
                'Oberoesterreich'; defines the possible orthographic text language in the input, the text-to-phoneme
                tranformation and some language specific transformations within the MAUS process. The code 'gsw-CH' (=
                Swiss German) denotes orthographic text input in Swiss German 'Dieth' encoding.

INSKANTEXTGRID: Switch to create an additional tier in the TextGrid output file with a word segmentation
                labelled with the canonic phonemic transcript (taken from the input KAN tier). This option can not be
                set in this service.

RELAXMINDUR: Option Relax Min Duration (RELAXMINDUR) changes the default minimum duration of 3 states for consonants 
                and short/lax vowels and of 4 states for tense/long vowels and diphthongs to 1 and 2 states respectively. 
                This is not optimal for general segmentation because MAUS will start to insert many very short 
                vowels/glottal stops where they are not appropriate. But for some special investigations 
                (e.g. the duration of /t/) it alleviates the ceiling problem at 30msec duration (at a standard frame rate of 10msec per state). 
                This option can not be set in this service.
                

OUTFORMAT: [par, exb, csv, TextGrid, emuDB, eaf, tei, bpf, mau] 
Option 'Output format' (OUTFORMAT): Defines the possible output formats: TextGrid - a praat compatible
                TextGrid file; bpf - a BPF file with tiers ORT (words), KAN (pronunciation) and MAU (phonetic segments); csv - a spreadsheet
                (CSV table) with word and phone segmentation; 
                emuDB - an Emu compatible *_annot.json file; 
                eaf - an ELAN compatible annotation file; exb - an EXMARaLDA compatible annotation file; 
                tei - Iso TEI document (XML).
                For a description of BPF see
                http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html.
                for a description of Emu see https://github.com/IPS-LMU/emuR.
                Note 1: using 'emuDB' will first produce only single annotation file *_annot.json;
                in the WebMAUS interface (https://clarin.phonetik.uni-muenchen.de/BASWebServices) you can process 
                more than one file and than download a zipped Emu database; in this case don't forget to change the default
                name of the emuDB 'MAUSOUTPUT' using the R function emuR::rename_emuDB().
                Note 2: if you need the same result in more than one format, select 'bpf' to produce a BPF file, and then 
                convert this file with the service runAnnotConv ('AnnotConv') into the desired formats. 
                Note 3: some format conversions are not loss-less; select 'bpf' to be sure that no information is lost.
                

PRESEG: Option PRESEG: If set to true, a pre-segmentation using the wav2trn tool is done by the
                webservice on-the-fly; this is useful, if the input
                signal has leading and/or trailing silence. If this option is set in combination with USETRN=true
                and the input BPF contains a chunk segmentation (tier TRN), then the 
                presegmentation is carried out for every single chunk. This option can not be
                set in this service.

USETRN: If set to true, the service searches the input BPF for a TRN tier (turn/chunk
                segmentation, see http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatsdeu.html#TRN). The synopsis
                for a TRN entry is: 'TRN: (start-sample) (duration-sample) (word-link-list) (label)', e.g. 'TRN: 23654
                56432 0,1,2,3,4,5,6 sentence1' (the speech within the recording 'sentence1' starts with sample 23654,
                last for 56432 samples and covers the words 0-6). If only one TRN entry is found, the segmentation is
                restricted within a time range given by this TRN tier entry; this is useful, if there exists a reliable
                pre-segmentation of the recorded utterance, i.e. the start and end of speech within the recording is
                known. If more than one TRN entry is found, the webservice performs an segmentation for each 'chunk'
                defined by a TRN entry and aggregates all individual results into a single results file; this is useful
                if the input consists of long recordings, for which a manual chunk segmentation is available. If USETRN
                is set to 'force', a pre-segmentation using the wav2trn tool is done by the webservice on-the-fly; this
                is useful, if the input BPF does not contain a TRN entry and the input signal has leading and/or
                trailing silence. This option can not be set in this service.

TARGETRATE: Option TARGETRATE: the resolution of segment boundaries in output measured in 100nsec units (default 100000 = 10msec). Decreasing this value (min is 10000) increases computation time, does not increase segmental accuracy in average, but allows output segment boundaries to assume more possible values (default segment boundaries are quantizised in 10msec steps). This is useful, if MAUS results are analysed for duration of phones or syllables. This option can not be
                set in this service. 
                

TEXT: orthographic text of the utterance to be segmented; words are white space separated;
                encoding is utf-8; punctuations are ignored

INSORTTEXTGRID: Switch to create an additional tier ORT in the TextGrid output file with a word
                segmentation labelled with the orthographic transcript (taken from the input ORT tier); this option is
                only effective, if the input BPF contains an additional ORT tier. This option can not be set in this
                service.

NOINITIALFINALSILENCE: NOINITIALFINALSILENCE: 
                Switch to suppress the automatic modeling of an optional leading/trailing silence interval. This is
                useful, if for instance the signal is known to start with a stop and no leading silence, and the silence model would 
                'capture' the silence interval from the plosive. This option can not be set in this service.

Output: A XML response containing the tags "success", "downloadLink", "output" and "warning.
                success states if the processing was successful or not, downloadLink specifies the location where the
                Praat TextGrid file can be found, output contains the output that is mostly useful during debugging
                errors and warnings if any warnings occured during the processing. The Praat TextGrid file containing
                three tiers: orthographic transcription (segmented in words), canonical phonemic transcription in
                SAM-PA (segmented in words), phonemic segmentation by MAUS in SAM-PA
----------------------------------------------------------------
----------------------------------------------------------------
runAnnotConv
------------------
Description: This service is a general purpose annotation converter from BAS Partitur Format (BPF) to several standards. 
              The services reads an annotation file of format INPFORMAT and converts it into the 
              annotation format given in option OUTFORMAT. Most conversions require at least one annotation layer with timing information.
              Details about the BAS Partitur Format (BPF) can be found in 
              https://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html#Partitur.
            

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F INPFORMAT=bpf -F outFormat=TextGrid -F INP=@<filename> 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runAnnotConv'

Parameters:  [INPFORMAT] [outFormat] INP

Parameter description: 
INPFORMAT: [bpf] 
Option INPFORMAT: the annotation format of the input file.
                

outFormat: [exb, csv, TextGrid, emuDB, eaf, tei] 
Option outFormat: the annotation format of the output file. Note that some 
                annotation formats may contain only a subset of the information that is contained in the input.
                For example, if the input BPF contained the tiers ORT,KAN,MAU,GES and outFormat is set 
                to 'TextGrid' only the tiers ORT,KAN and MAU are transformed into the output TextGrid without 
                warning. This might be important, if you use this converter within a pipeline that produces more than 
                the basic time-alignment.  
                

INP: The input annotation file to be converted; the format must match the option INPFORMAT.
                

Output: A XML response containing the elements "success", "downloadLink", "output" and "warning".
                "success" states if the processing was successful or not, "downloadLink" specifies the location where
                the output annotation file can be found, "output" contains the output that is mostly useful during 
                debugging errors and "warning" lists warnings, if any occured during processing. 
                
----------------------------------------------------------------
----------------------------------------------------------------
runMAUSGetHelp
------------------
Description: Returns the help of the MAUS tool on the server which describes the available parameters in
            more detail.

Example curl call is:
curl -v -X GET -H 'content-type: application/x-www-form-urlencoded' 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runMAUSGetHelp'


Parameter description: 
Output: Help message of the actual MAUS tool.
----------------------------------------------------------------
----------------------------------------------------------------
runAudioEnhance
------------------
Description: This services reads a media file and performs several signal processing operations
              mostly based on the SoX ('Sound Exchange') and N-HANS projects. Without any options set, the service
              produces a RIFF WAVE audio file optimized for processing in the BAS WebServices. 
              Details about the 'Sound Exchange' project (SoX) see https://www.openhub.net/p/sox.
              Details about the 'N-HANS' project see https://github.com/N-HANS/N-HANS.
              Depending on input and given options the service
              extracts sound track from video input,
              converts non-RIFF sound formats into RIFF,
              merge/re-arrange multi-channel files,
              re-sample to given sampling rate,
              (spectral) filters signal for constant background noise,
              applies high-, low-, band pass- and band reject filter,
              applies speech rate manipulation without changing the length (speed),
              manipulate pitch without changing the speech rate,
              removes complex noise while preserving a target noise/voice (N-HANS),
              separates a target speaker from an interference speaker/speaker group (N-HANS).
              In the current version the input audio format is not retained; the output audio format 
              is always RIFF WAVE.
            

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F SIGNAL=@<filename> -F NHANS=none -F MONO=true -F PITCH=0 -F NOISE=0 -F NOISEPROFILE=0 -F LOWF=0 -F neg=@<filename> -F RESAMPLE=0 -F pos=@<filename> -F CHANNELSELECT= -F NORM=true -F TEMPO=1.0 -F HIGHF=0 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runAudioEnhance'

Parameters:  SIGNAL [NHANS] [MONO] [PITCH] [NOISE] [NOISEPROFILE] [LOWF] [neg] [RESAMPLE] [pos] [CHANNELSELECT] [NORM] [TEMPO] [HIGHF]

Parameter description: 
SIGNAL: The input media file to be processed; the format is recognized by the file's 
                extension; supported formats are: wav,nis,sph,mp3,mpeg,mp4,avi,flv
                (although the mimetype of this input file is restricted to RIFF AUDIO
                audio/x-wav). (extension wav), most pipes will also process NIST/SPHERE (nis|sph) and
                several kinds of video formats.)
                

NHANS: [none, denoiser, separator] 
Option NHANS: the N-HANS audio enhancement mode (default: 'none') applied to the 
                result of the SoX pipeline. 
                'denoiser' : the noise as represented in the sample recording  
                uploaded in the mandatory option file 'neg' is removed from the signal; if another voice or noise sample is uploaded in 
                option file 'pos' (optional), this noise/voice is being preserved in the signal together with the main voice. 
                'separator' : an interference speaker or speaker group as represented in the sample recording 
                uploaded in the mandatory option file 'neg' is removed from the signal while the voice of a target speaker as uploaded in 
                the mandatory option file 'pos' is being preserved in the signal.
                Both sample signals, 'neg' and 'pos', are applied to all processed input signals; do not upload more than 2sec of clean signal,
                and make sure that the relevant signal is present within the very first second; 'clean signal' means that the sample should
                not contain any traces of the main voice or of the other noise sample. 
                

MONO: [true, false] 
Option MONO: if true (selected) input channels are merged. Note that most operations (e.g. filtering, normalization) are performed on the individual channel before the merge. Note that most operations (e.g. filtering, normalization) are performed on the individual channel before the merge.
                

PITCH: [-1000.0, 1000.0] 
Option PITCH: pitch shift in 100th of a semi-tone without changing speech rate.
                E.g. PITCH = -100 shift fundamental frequency by one semi-tone down.  
                

NOISE: [0.0, 100.0] 
Option NOISE: if set to a value between 1...100, a noise profile is calculated 
                from the leading and/or trailing parts of the input signal, and then the signal is 
                noise reduced with a strength proportional to the NOISE value 
                (using SoX spectral noise reduction effect 'noisered'). 
                The noise reduction is applied before
                any other processing/merging in all input channels.
                If NOISE=0, no noise reduction takes place.
                

NOISEPROFILE: [-1000000.0, 1000000.0] 
Option NOISEPROFILE: if set to 0 (default), the noise profile is calculated from the leading and trailing
                portion of the recording (estimated by a silence detector); if set to a positive value, the noise profile is calculated 
                from the leading NOISEPROFILE samples; if set to a negative Value, the noise profile is calculated
                from the trailing NOISEPROFILE samples. This is useful, if the recording contains loud noise at the begin/end of the recording
                that woud not be selected by the silence detector (because of too much energy).
                

LOWF: [0.0, 30000.0] 
Option LOWF: lower filter edge in Hz. If set >0Hz and HIGHF is 0Hz, a high pass filter 
                with LOWF Hz is applied; if set >0Hz and HIGHF is set higher than LOWF, a band
                pass between LOWF and HIGHF is applied; if set >0Hz and HIGHF is set higher than 0Hz but lower 
                than LOWF, a reject band pass between HIGHF and LOWF is applied. 
                E.g. HIGHF = 3000 LOWF = 300 is telephone band; HIGHF = 45 LOWF = 55 filters out a 50Hz hum.
                

neg: 
                Option neg : N-HANS sample recording (RIFF WAVE *.wav) of the noise to be removed from signal (mode 'denoiser') or 
                the speaker/speaker group to be removed from signal (mode 'separator').
                The 'neg' sample is applied to all processed input signals; do not upload more than 2sec of clean signal,
                and make sure that the relevant signal is present within the very first second; 'clean signal' means that the sample should
                not contain any traces of the main voice or of the 'pos' noise sample. 
                The upload of the 'neg' sample is mandatory for both N-HANS modi (see option 'NHANS').
                

RESAMPLE: [0.0, 96000.0] 
Option RESAMPLE: re-sample signal to this value in Hz; RESAMPLE=0 : no re-sampling.
                

pos: 
                Option pos : N-HANS sample recording (RIFF WAVE *.wav) of the noise to be preserved in the signal (mode 'denoiser') or 
                the target speaker to be preserved in the signal (mode 'separator').
                The 'pos' sample is applied to all processed input signals; do not upload more than 2sec of clean signal,
                and make sure that the relevant signal is present within the very first second; 'clean signal' means that the sample should
                not contain any traces of the main voice (mode 'denoiser') nor of the 'pos' noise sample (modes 'denoiser' and 'separator'). 
                The upload of the 'pos' sample is mandatory for N-HANS mode 'separator' and optional for mode 'denoiser' (see option 'NHANS').
                

CHANNELSELECT: Option CHANNELSELECT: list of comma-separated channel numbers that are selected
                for further processing from the input media file.
                Examples: 
                MONO=true,CHANNELSELECT="" : merge multi-channel files into one channel, 
                MONO=true,CHANNELSELECT="2,3,4" : merge only selected channels into one channel,
                MONO=false, CHANNELSELECT="3,4,1,2" : select and re-arrange channels,
                MONO=false, CHANNELSELECT="" : do nothing.
                Note that channels are numbered starting with 1 = left channel in stereo, 2 = right channel, ...
                By reversing the order of channel numbers in CHANNELSELECT you can swap channels, e.g.
                CHANNELSELECT="2,1" MONO=false will swap left and right channel of a stereo signal.
                

NORM: [true, false] 
Option NORM: if true (selected) each input channel is amplitude normalised to -3dB before any merge.
                

TEMPO: [0.25, 4.0] 
Option TEMPO: factor of speech rate change; >1 speeds up, <1 slows down.
                E.g. TEMPO = 1.5 increase speech rate by 50% (signal gets shorter).
                

HIGHF: [0.0, 30000.0] 
Option HIGHF: upper filter edge in Hz. If set >0Hz and LOWF is 0Hz, a low pass filter 
                with HIGHF Hz is applied; if set >0Hz and LOWF is set lower than HIGHF, a band
                pass between LOWF and HIGHF is applied; if set >0Hz and LOWF is set higher than 0Hz but higher 
                than HIGHF, a reject band pass between HIGHF and LOWF is applied. 
                E.g. HIGHF = 3000 LOWF = 300 is telephone band; HIGHF = 45 LOWF = 55 filters out a 50Hz hum.
                

Output: A XML response containing the elements "success", "downloadLink", "output" and "warning".
                "success" states if the processing was successful or not, "downloadLink" specifies the location where
                the output annotation file can be found, "output" contains the output that is mostly useful during 
                debugging errors and "warning" lists warnings, if any occured during processing. 
                
----------------------------------------------------------------
----------------------------------------------------------------
runGetVersion
------------------
Description: Returns the version number of the different underlying tools. If no option
            is specified it returns the number of the services.

Example curl call is:
curl -v -X GET -H 'content-type: application/x-www-form-urlencoded' 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runGetVersion?service=services'

Parameters:  [service]

Parameter description: 
service: [runAnonymizer, runAnnotConv, runASR, runAudioEnhance, runChannelSeparator, runChunker, runChunkPreparation, runCOALA, runEMUMagic, runFormantAnalysis, runG2P, runMAUS, runMINNI, runPho2Syl, runPipeline, runPipelineWithASR, runSpeakDiar, runSubtitle, runTextEnhance, runVoiceActivityDetection, services] 
 Name of the service to get the version.

Output: Version number of the tool requested.
----------------------------------------------------------------
----------------------------------------------------------------
runPipeline
------------------
Description: This is a service that combines two or more BAS webservices into a processing chain (pipeline) without Automatic Speech Recognition (ASR). To run a pipeline with ASR use the service 'Pipeline with ASR' (runPipelineWithASR). Since not every BAS webservice can be combined with another, the service only offers pipelines that make sense for the user. All pipelines executed by this service can also be executed by calling two or more BAS webservices after another and passing the output of one service to the next. The benefit, however, is that the user data (which can be substantially large) will be up- and down-loaded only once, and of course that the user does not have to formulate several BAS webservice calls (with matching parameters). The parameter PIPE defines which processing pipeline will be executed; depending on the value of PIPE the service accepts parameters for the BAS webservices which are involved in the pipeline, and which make sense in the context of the pipeline. Other parameters will be set automatically depending on the value of PIPE (e.g. the MAUS parameter USETRN will be set to 'true' in the case of a pipeline where the runChunkPreparation service passes a BPF file to the runMAUS service containing a chunk segmentation in the TRN tier). Since this service basically comprise of all BAS web services, the number of possible parameters is necessarily huge. To make the selection easier we group the parameters into MANDATORY (that have to be set for every pipeline), optional parameters that are shared by more than one service, and then by PIPELINE ELEMENT (e.g. ASR, MAUS, in alphabetical order). In most cases it is sufficient to set the MANDATORY parameters, and the Pipeline service will then set the element specific parameters automatically. The service will perform a pre-check on all set parameters to detect conflicts and then terminate with an informative message; but there are still many cases where the pipeline will start working and then terminate with an error caused by a service later down the pipe. Starting with version 6.0 the service will deliver a ZIP archive instead of the output of the last service in PIPE, if the option 'KEEP' ('Keep everything') is enabled; this ZIP will contain input(s), all intermediary results, end result and a protocol of the pipeline process.
            

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F com=yes -F INSKANTEXTGRID=true -F USETEXTENHANCE=true -F TARGETRATE=100000 -F TEXT=@<filename> -F NOISE=0 -F PIPE= -F aligner=hirschberg -F NOISEPROFILE=0 -F neg=@<filename> -F speakMatch= -F speakNumber=0 -F ASIGNAL=brownNoise -F NORM=true -F mauschunking=false -F minSpeakNumber=0 -F INSORTTEXTGRID=true -F WEIGHT=default -F minanchorlength=3 -F LANGUAGE=deu-DE -F NHANS=none -F USEAUDIOENHANCE=true -F maxlength=0 -F KEEP=false -F LEFT_BRACKET=# -F nrm=no -F LOWF=0 -F WHITESPACE_REPLACEMENT=_ -F CHANNELSELECT= -F marker=punct -F USEREMAIL= -F boost=true -F except=@<filename> -F MINPAUSLEN=5 -F forcechunking=false -F NOINITIALFINALSILENCE=false -F InputTierName=unknown -F BRACKETS=<> -F OUTFORMAT=TextGrid -F syl=no -F ENDWORD=999999 -F wsync=yes -F UTTERANCELEVEL=false -F featset=standard -F pos=@<filename> -F APHONE= -F INSPROB=0.0 -F OUTSYMBOL=x-sampa -F RULESET=@<filename> -F maxSpeakNumber=0 -F allowOverlaps=false -F minchunkduration=15 -F SIGNAL=@<filename> -F stress=no -F imap=@<filename> -F MODUS=default -F RELAXMINDUR=false -F ATERMS=@<filename> -F RELAXMINDURTHREE=false -F STARTWORD=0 -F INSYMBOL=sampa -F PRESEG=false -F AWORD=ANONYMIZED -F USETRN=false -F MAUSSHIFT=default -F HIGHF=0 -F silenceonly=0 -F boost_minanchorlength=4 -F ADDSEGPROB=false 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runPipeline'

Parameters:  [com] [INSKANTEXTGRID] [USETEXTENHANCE] [TARGETRATE] TEXT [NOISE] [PIPE] [aligner] [NOISEPROFILE] [neg] [speakMatch] [speakNumber] [ASIGNAL] [NORM] [mauschunking] [minSpeakNumber] [INSORTTEXTGRID] [WEIGHT] [minanchorlength] [LANGUAGE] [NHANS] [USEAUDIOENHANCE] [maxlength] [KEEP] [LEFT_BRACKET] [nrm] [LOWF] [WHITESPACE_REPLACEMENT] [CHANNELSELECT] [marker] [USEREMAIL] [boost] [except] [MINPAUSLEN] [forcechunking] [NOINITIALFINALSILENCE] [InputTierName] [BRACKETS] [OUTFORMAT] [syl] [ENDWORD] [wsync] [UTTERANCELEVEL] [featset] [pos] [APHONE] [INSPROB] [OUTSYMBOL] [RULESET] [maxSpeakNumber] [allowOverlaps] [minchunkduration] SIGNAL [stress] [imap] [MODUS] [RELAXMINDUR] [ATERMS] [RELAXMINDURTHREE] [STARTWORD] [INSYMBOL] [PRESEG] [AWORD] [USETRN] [MAUSSHIFT] [HIGHF] [silenceonly] [boost_minanchorlength] [ADDSEGPROB]

Parameter description: 
com: [yes, no] 
Option com (Keep Annotation): yes/no decision whether <*> strings in text inputs should be treated as annotation
                markers (yes) or as spoken words (no). If set to 'yes', then strings of this type are considered as annotation markers that are not
                processed as spoken words but passed on to the output.
                The <*> markers will appear in the ORT
                and KAN tier with a word index on their own. WebMAUS makes use of two special markers < usb > (e.g.
                non-understandable word or other human noises) and < nib > (non-human noise). 
                All other markers <*> are modelled as silence. Markers must be separated from word tokens
                by blanks; they do not need to be blank-separated from non-word tokens as punctuation. Note that the default service 'TEXTENHANCE'
                that is called by any pipeline that reads input text will replace white space characters (such as blanks) within the <*> by 
                the character given in option 'White space replacement'.

INSKANTEXTGRID: [true, false] 
OPTION INSKANTEXTGRID: Switch to create an additional tier in the TextGrid output file
                with a word segmentation labelled with the canonic phonemic transcript (taken from the input KAN
                tier).

USETEXTENHANCE: [true, false] 
Switch on the input text pre-processing 'textEnhance' (true).
                If the PIPE starts with G2P, the input text is first normalized by 'textEnhance'.
                Different TXT formats are mapped to simple UTF-8 Unix style TXT format, and textmarkers 
                are normalized to be conform with BAS WebServices.
                

TARGETRATE: [100000, 20000, 10000] 
Option TARGETRATE: the resolution of segment boundaries in output measured in 100nsec units (default 100000 = 10msec). Decreasing this value (min is 10000) increases computation time, does not increase segmental accuracy in average, but allows output segment boundaries to assume more possible values (default segment boundaries are quantizised in 10msec steps). This is useful, if MAUS results are analysed for duration of phones or syllables.
                

TEXT: Mandatory parameter TEXT: The textual input to the pipeline, usually some form of text or transcript. Depending on parameter PIPE this can be a text document (all formats supported by service runTextEnhance), a comma separated spreadsheet (csv), a praat TextGrid (TextGrid), an ELAN EAF (eaf), or a BAS Partitur Format (par) file. See
                http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html for detailed description of the
                BPF. Note that PIPEs starting with service ASR or MINNI do not require this parameter.
                  Special languages for text input:
                        Thai, Russian and Georgian expect their respective standard alphabets;
                        Japanese allows Kanji or Katakana or a mixture of both, but the tokenized output will contain only the
                        Katakana version of the input;
                        Swiss German expects input to be transcribed in 'Dieth' (https://en.wikipedia.org/wiki/Swiss_German);
                        Australian Aboriginal languages (including Kunwinjku, Yolnu Matha) expect so called 'Practical Orthography'
                        (https://en.wikipedia.org/wiki/Transcription_of_Australian_Aboriginal_languages);
                        Persian accepts a romanized version of Farsi developped by Elisa Pellegrino and Hama Asadi (see
                        http://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/PersianRomanizationTable.pdf)
                        for details).
                

NOISE: [0.0, 100.0] 
Option NOISE: if set to a value between 1...100, a noise profile is calculated 
                from the leading and/or trailing parts of the input signal, and then the signal is 
                noise reduced with a strength proportional to the NOISE value 
                (using SoX spectral noise reduction effect 'noisered'). 
                The noise reduction is applied before
                any other processing/merging in all input channels.
                If NOISE=0, no noise reduction takes place.
                

PIPE: [G2P_CHUNKER, CHUNKER_MAUS, CHUNKER_MAUS_SD, CHUNKER_MAUS_PHO2SYL, CHUNKER_MAUS_PHO2SYL_SD, CHUNKER_MAUS_SUBTITLE, CHUNKER_MAUS_SUBTITLE_SD, CHUNKER_MAUS_SUBTITLE_PHO2SYL, CHUNKER_MAUS_SUBTITLE_PHO2SYL_SD, CHUNKPREP_G2P_MAUS, CHUNKPREP_G2P_MAUS_SD, CHUNKPREP_G2P_MAUS_PHO2SYL, CHUNKPREP_G2P_MAUS_PHO2SYL_SD, CHUNKPREP_G2P_MAUS_SUBTITLE, CHUNKPREP_G2P_MAUS_SUBTITLE_SD, CHUNKPREP_G2P_MAUS_SUBTITLE_PHO2SYL, CHUNKPREP_G2P_MAUS_SUBTITLE_PHO2SYL_SD, G2P_CHUNKER_MAUS, G2P_CHUNKER_MAUS_SD, G2P_CHUNKER_MAUS_PHO2SYL, G2P_CHUNKER_MAUS_PHO2SYL_SD, G2P_CHUNKER_MAUS_SUBTITLE, G2P_CHUNKER_MAUS_SUBTITLE_SD, G2P_CHUNKER_MAUS_SUBTITLE_PHO2SYL, G2P_CHUNKER_MAUS_SUBTITLE_PHO2SYL_SD, G2P_MAUS, G2P_MAUS_SD, G2P_MAUS_PHO2SYL, G2P_MAUS_PHO2SYL_SD, G2P_MAUS_SUBTITLE, G2P_MAUS_SUBTITLE_SD, G2P_MAUS_SUBTITLE_PHO2SYL, G2P_MAUS_SUBTITLE_PHO2SYL_SD, MAUS_PHO2SYL, MAUS_PHO2SYL_SD, MAUS_SUBTITLE, MAUS_SUBTITLE_SD, MAUS_SUBTITLE_PHO2SYL, MAUS_SUBTITLE_PHO2SYL_SD, CHUNKER_MAUS_ANONYMIZER, CHUNKER_MAUS_ANONYMIZER_SD, CHUNKER_MAUS_PHO2SYL_ANONYMIZER, CHUNKER_MAUS_PHO2SYL_ANONYMIZER_SD, CHUNKER_MAUS_ANONYMIZER_SUBTITLE, CHUNKER_MAUS_ANONYMIZER_SUBTITLE_SD, CHUNKER_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER, CHUNKER_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER_SD, CHUNKPREP_G2P_MAUS_ANONYMIZER, CHUNKPREP_G2P_MAUS_ANONYMIZER_SD, CHUNKPREP_G2P_MAUS_PHO2SYL_ANONYMIZER, CHUNKPREP_G2P_MAUS_PHO2SYL_ANONYMIZER_SD, CHUNKPREP_G2P_MAUS_ANONYMIZER_SUBTITLE, CHUNKPREP_G2P_MAUS_ANONYMIZER_SUBTITLE_SD, CHUNKPREP_G2P_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER, CHUNKPREP_G2P_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER_SD, G2P_CHUNKER_MAUS_ANONYMIZER, G2P_CHUNKER_MAUS_ANONYMIZER_SD, G2P_CHUNKER_MAUS_PHO2SYL_ANONYMIZER, G2P_CHUNKER_MAUS_PHO2SYL_ANONYMIZER_SD, G2P_CHUNKER_MAUS_ANONYMIZER_SUBTITLE, G2P_CHUNKER_MAUS_ANONYMIZER_SUBTITLE_SD, G2P_CHUNKER_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER, G2P_CHUNKER_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER_SD, G2P_MAUS_ANONYMIZER, G2P_MAUS_ANONYMIZER_SD, G2P_MAUS_PHO2SYL_ANONYMIZER, G2P_MAUS_PHO2SYL_ANONYMIZER_SD, G2P_MAUS_ANONYMIZER_SUBTITLE, G2P_MAUS_ANONYMIZER_SUBTITLE_SD, G2P_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER, G2P_MAUS_SUBTITLE_PHO2SYL_ANONYMIZER_SD, MAUS_ANONYMIZER, MAUS_ANONYMIZER_SD, MAUS_PHO2SYL_ANONYMIZER, MAUS_PHO2SYL_ANONYMIZER_SD, MAUS_ANONYMIZER_SUBTITLE, MAUS_ANONYMIZER_SUBTITLE_SD, MAUS_SUBTITLE_PHO2SYL_ANONYMIZER, MAUS_SUBTITLE_PHO2SYL_ANONYMIZER_SD] 
Parameter PIPE: The type of pipeline to process. Values of parameter PIPE have the general form SERVICE_SERVICE[_SERVICE ...], where SERVICE is one of G2P, MAUS, CHUNKER, CHUNKPREP, PHO2SYL, SUBTITLE, ANONYMIZER, SD (for pipelines executing ASR or MINNI service see service 'Pipeline with ASR' (runPipelineWithASR)). For example PIPE=G2P_CHUNKER_MAUS_PHO2SYL denotes a pipe that runs over these 4 services. For all pipelines in this service both, SIGNAL and TEXT inputs are necessary; the last SERVICE in PIPE determines which output the pipeline can produce. Therefore it is quite possible to call a pipe with impossible input/output configuration which will cause an ERROR. Every media file uploaded will first be passed through the service 'AudioEnhance' to normalized the media file to a RIFF WAVE format file; every text input is first run through the service 'TextEnhance' to normalize the text format; for both these obligatory services exist options as for the other pipeline SERVICES. Special pipe '..._SD' : the final speaker diarization module (SD) does not actual read any annotations from the previous services;
     it rather runs the speaker diarization in parallel on the signal input and then merges the speaker segmentation
     and laelling with whatever the rest of the pipe has produced, e.g. it merges speaker segments and word segments
     to produce a (symbolic) speaker labelling of the word segments.
                

aligner: [hirschberg, fast] 
Symbolic aligner to be used. The "fast" aligner performs approximate alignment by splitting the alignment matrix into "windows" of size 5000*5000. The "hirschberg" aligner performs optimal matching. On recordings below the 1 hour mark, the choice of aligner does not make a big difference in runtime. On longer recordings, you can improve runtime by selecting the "fast" aligner. Note however that this choice increases the probability of errors on recordings with untranscribed stretches (such as long pauses, musical interludes, untranscribed speech). Therefore, the "hirschberg" aligner should be used on this kind of material.

NOISEPROFILE: [-1000000.0, 1000000.0] 
Option NOISEPROFILE: if set to 0 (default), the noise profile is calculated from the leading and trailing
                portion of the recording (estimated by a silence detector); if set to a positive value, the noise profile is calculated 
                from the leading NOISEPROFILE samples; if set to a negative value, the noise profile is calculated
                from the trailing NOISEPROFILE samples. This is useful, if the recording contains loud noise at the begin/end of the recording
                that would not be selected by the silence detector (because of too much energy).
                

neg: 
                Option neg : N-HANS sample recording (RIFF WAVE *.wav) of the noise to be removed from signal (mode 'denoiser') or 
                the speaker/speaker group to be removed from signal (mode 'separator').
                The 'neg' sample is applied to all processed input signals; do not upload more than 2sec of clean signal,
                and make sure that the relevant signal is present within the very first second; 'clean signal' means that the sample should
                not contain any traces of the main voice or of the 'pos' noise sample. 
                The upload of the 'neg' sample is mandatory for both N-HANS modi (see option 'NHANS').
                

speakMatch: Option speakMatch: if set to a list of comma separated names (e.g.
		speakMatch='Anton,Berta,Charlie', the corresponding speaker labels found by the 
		speaker diarization in the order of appearance are replaced by these names (e.g. 'S1' to 'Anton',
		'S2' to 'Berta' etc.). This allows the user to create SD annotation using her self
		defined speaker labels, if the user knows the order of appearance; it is obvious
		that this feature only makes sense in single file processing, since the speaker labels
		and the order of appearance differ from one recording to the next; the suggested mode
		of operation is to run the service in batch mode over all recordings with speakMatch="",
		then inspect manually the resulting annotation and define speaker labels in the order of 
		appearance for each recording, and then run the service in single file mode for each 
		recording again with the corresponding speakMatch list. If the speakMatch option contains 
		a comma separated list of value pairs like 'S1:Anton', only the speaker labels listed 
		on the lefthand side of each pair are patched, e.g. for speakMatch='S3:Charlie,S6:Florian'
		only the third and sixth appearing speaker are renamed to Charlie and Florian respectively.
                

speakNumber: [0.0, 999999.0] 
Option speakNumber restricts the number of detected speakers by the speaker diarization to the 
		given number. If set to 0 (default), the SD method determines the number automatically.
	        

ASIGNAL: [brownNoise, beep, silence] 
Option ASIGNAL: the type of signal to mask anonymized terms in the signal.
                'brownNoise' is brown noise; 'beep' is a 500Hz sinus; 'silence' is total silence (zero signal); masking signals have an amplitude of -10dB of the maximum amplitude
                and are faded in and out with a very short sinoid function.
                

NORM: [true, false] 
Option NORM: if true (selected) each input channel is amplitude normalised to -3dB before any merge.
                

mauschunking: [true, false] 
If this parameter is set to true, the recognition module will model words as MAUS graphs as opposed to canonical chains of phonemes. This will slow down the recognition engine, but it may help with non-canonical speech (e.g., accents or dialects).

minSpeakNumber: [0.0, 999999.0] 
Option minSpeakNumber defines a hard lower bound of the number of detected speakers. If set to 0 (default), no lower bound.
	        

INSORTTEXTGRID: [true, false] 
Option INSORTTEXTGRID: Switch to create an additional tier ORT in the TextGrid output file
                with a word segmentation labelled with the orthographic transcript (taken from the input ORT tier);
                this option is only effective, if the input BPF contains an additional ORT tier.

WEIGHT: 
                The option WEIGHT weights the influence of the statistical pronunciation model against the
                acoustical scores. More precisely WEIGHT is multiplied to the pronunciation model score (log
                likelihood) before adding the score to the acoustical score within the search. Since the pronunciation
                model in most cases favors the canonical pronunciation, increasing WEIGHT will at some point cause MAUS
                to choose always the canonical pronunciation; lower values of WEIGHT will favor less probable paths be
                selected according to acoustic evidence. If the acoustic quality of the signal is very good and the
                HMMs of the language are well trained, it makes sense to lower WEIGHT. For most languages this option
                is default to 1.0. In an evaluation on parts of the German Verbmobil data set (27425 segments) which
                were segmented and labelled manually (MAUS DEV set) WEIGHT was optimized to 7.0. Note that this might
                NOT be the optimal value for other languages. For instance Italian shows best results with WEIGHT=1.0,
                Estonian with WEIGHT=2.5. If set to default, a language specific optimal value is chosen automatically.
                

minanchorlength: [2.0, 8.0] 
The chunker performs speech recognition and symbolic alignment to find regions of correctly aligned words (so-called 'anchors'). Setting this parameter to a high value (e.g. 4-5) means that the chunker finds chunk boundaries with higher certainty. However, the total number of discovered chunk boundaries may be reduced as a consequence. A low value (e.g. 2) is likely to lead to a more fine-grained chunking result, but with lower confidence for individual chunk boundaries.

LANGUAGE: [cat, deu, eng, fin, hat, hun, ita, mlt, nld, nze, pol, aus-AU, afr-ZA, sqi-AL, arb, eus-ES, eus-FR, cat-ES, nld-NL-GN, nld-NL, nld-NL-OH, nld-NL-PR, eng-US, eng-AU, eng-GB, eng-GB-OH, eng-GB-OHFAST, eng-GB-LE, eng-SC, eng-NZ, ekk-EE, kat-GE, fin-FI, fra-FR, deu-DE, gsw-CH-BE, gsw-CH-BS, gsw-CH-GR, gsw-CH-SG, gsw-CH-ZH, gsw-CH, hat-HT, hun-HU, isl-IS, ita-IT, jpn-JP, gup-AU, sampa, ltz-LU, mlt-MT, nor-NO, fas-IR, pol-PL, ron-RO, rus-RU, slk-SK, spa-ES, swe-SE, tha-TH, guf-AU] 

                Language: RCFC5646 locale code of the processed speech; defines the phoneme set of input and the orthographic system of input text (if any); 
                we use the RFC5646 sub-structure 'iso639-3 - iso3166-1 [ - iso3166-2], e.g. 'eng-US' for American English, 'deu-AT-1' for Austrian German spoken in
                'Oberoesterreich'; the code 'sampa' ('Language independent') allows the user to upload a customized
                mapping from orthographic to phonologic form (see option 'imap'). Special languages: 'gsw-CH' denotes
                text written in Swiss German 'Dieth' transcription (https://en.wikipedia.org/wiki/Swiss_German); 'gsw-CH-*' are localized varieties in larger Swiss cities;
                'jpn-JA' (Japanese) accepts Kanji or Katakana or a mixture of both, but the tokenized output will contain only the Katakana version of the input;
                'aus-AU' (Australian Aboriginal languages, including Kunwinjku, Yolnu Matha) accept so called 'Modern Practical Orthography'
                (https://en.wikipedia.org/wiki/Transcription_of_Australian_Aboriginal_languages);
                'fas-IR' (Persian) accepts a romanized version of Farsi developped by Elisa Pellegrino and Hama Asadi (see
                http://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/PersianRomanizationTable.pdf for details);
                'arb' is a macro language covering all Arabic varieties; the input must be encoded in a broad phonetic romanization developped
                by Jalal Tamimi and colleagues (see http://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/TamimiRomanization.pdf for details).
                The language code is passed to all services of the pipeline, thus influencing the way these services will 
                process the speech. If one member of the PIPE does not support the language, the service will try to determine another 
                suitable language (WARNING is issued) or, if that is not possible, an ERROR is returned. Note 
                that some services will support more languages than offered in the pipeline service, but we restrict the 
                pipeline languages to a reasonable core set that is supported by most services.
                

NHANS: [none, denoiser, separator] 
Option NHANS: the N-HANS audio enhancement mode (default: 'none') applied to the 
                result of the SoX pipeline. 
                'denoiser' : the noise as represented in the sample recording  
                uploaded in the mandatory option file 'neg' is removed from the signal; if another voice or noise sample is uploaded in 
                option file 'pos' (optional), this noise/voice is being preserved in the signal together with the main voice. 
                'separator' : an interference speaker or speaker group as represented in the sample recording 
                uploaded in the mandatory option file 'neg' is removed from the signal while the voice of a target speaker as uploaded in 
                the mandatory option file 'pos' is being preserved in the signal.
                Both sample signals, 'neg' and 'pos', are applied to all processed input signals; do not upload more than 2sec of clean signal,
                and make sure that the relevant signal is present within the very first second; 'clean signal' means that the sample should
                not contain any traces of the main voice or of the other noise sample. 
                

USEAUDIOENHANCE: [true, false] 
Switch on the signal normalization 'AudioEnhance' (true).
                

maxlength: [0.0, 999.0] 
Maximum subtitle length. If set to 0,
                        subtitles of indefinite length are created, based only
                        on the distance of the split markers. If set to a
                        value greater than 0, subtitles are split whenever a
                        stretch between two neighbouring split markers is
                        longer than that value (in words). Caution: This may
                        lead to subtitle splits in suboptimal locations (e.g.
                        inside syntactic phrases).

KEEP: [true, false] 
Keep everything (KEEP): If set to true (default: false), the service will return a ZIP archive instead of the output of the last service in PIPE. The ZIP is named as the output file name (as defined in OUT) with extension zip and contains the following files: input(s) including optional files (e.g. RULESET), all intermediary results of the PIPE, the result of the pipeline, and a protocol listing all options; all stored files in the ZIP start with the file name body of the SIGNAL input followed by the marker '_LABEL', which indicates from which part of the pipe the file is produced, and the appropriate file type extension; 'LABEL' is one of INPUT, AUDIOENHANCE (which marks the pre-processed media file), TEXTENHANCE (which marks the pre-processed text input file if applicable), ASR, CHUNKER, CHUNKPREP, G2P, MAUS, PHO2SYL, ANONYMIZER, SUBTITLE and README (which marks the protocol file). The protocol file contains a simple list of 'option = value' pairs. The result file(s) of the pipeline have no '_LABEL' marker. The KEEP option is useful for documenting scientific pipeline runs, and for retrieving results that are produced by the PIPE but are overwritten/not passed on by later services (e.g. an anonymized video or CHUNKER output).

LEFT_BRACKET: One or more characters which mark comments reaching
                        until the end of the line (default: #). E.g. if your 
                        input text contains comment lines that begin with ';', 
                        set this option to ';' to avoid that these comments are
                        treated as spoken text. If you want to suppress the default
                        '#' comment character, set this option to 'NONE'.
                        If you are using comment lines in your input text, you must be absolutely 
                        sure that the comment character appears nowhere in the text except in comment lines!
                        Note 1: the characters '&', '|' and '=' do not work as comment characters.
                        Note 2: for technical reasons the value for this option cannot be empty.
                        Note 3: the default character '#' cannot be combined with other characters, e.g. if
                        you define this option as ';#', the '#' will be ignored.
			Note 4 (sorry): for the service 'Subtitle' comment lines must be terminated with
			a so called 'final punctuation sign', i.e. one of '.!?:…'; otherwise, an immediately
			following speaker marker will not be recognized.
                

nrm: [yes, no] 
Text normalization. Currently available for German and English only.
                Detects and expands 22 non-standard word types. All output file types supported but
                not available for the following tokenized input types: bpf, TextGrid, and tcf. If
                switched off, only number expansion is carried out.

LOWF: [0.0, 30000.0] 
Option LOWF: lower filter edge in Hz. If set >0Hz and HIGHF is 0Hz, a high pass filter 
                with LOWF Hz is applied; if set >0Hz and HIGHF is set higher than LOWF, a band
                pass between LOWF and HIGHF is applied; if set >0Hz and HIGHF is set higher than 0Hz but lower 
                than LOWF, a reject band pass between HIGHF and LOWF is applied. 
                E.g. HIGHF = 3000 LOWF = 300 is telephone band; HIGHF = 45 LOWF = 55 filters out a 50Hz hum.
                

WHITESPACE_REPLACEMENT: The character that whitespace in comments should be
                        substituted by (default: '_'). The BAS WebServices require 
                        that annotation markers or comment lines in input texts do not 
                        contain white spaces. This option let you decide which character
                        should be used to replace the white spaces. 
                        If set to the string 'NONE' no replacement takes place.
                        CAUTION: the characters '&' and '=' do not work as replacements.
                

CHANNELSELECT: Option CHANNELSELECT: list of comma-separated channel numbers that are selected
                for further processing from the input media file.
                Examples: 
                MONO=true,CHANNELSELECT="" : merge multi-channel files into one channel, 
                MONO=true,CHANNELSELECT="2,3,4" : merge only selected channels into one channel,
                MONO=false, CHANNELSELECT="3,4,1,2" : select and re-arrange channels,
                MONO=false, CHANNELSELECT="" : do nothing.
                Note that channels are numbered starting with 1 = left channel in stereo, 2 = right channel, ...
                By reversing the order of channel numbers in CHANNELSELECT you can swap channels, e.g.
                CHANNELSELECT="2,1" MONO=false will swap left and right channel of a stereo signal.
                

marker: [punct, newline, tag] 
Marker used to split transcription into subtitles. If
                        set to 'punct' (default), the transcription is split
                        after 'terminal' punctuation marks (currently [.!?:…]. If set
                        to 'newline', the transcription is split at newlines (\n
                        or \r\n). If set to 'tag', the program expects a special
                        < BREAK > tag inside the transcription (without the
                        blanks between the brackets and BREAK!).

USEREMAIL: Option USEREMAIL: if a valid email address is provided through this option, the 
                service will send the XML file containing the results of the service run to this address after completion.
                It is recommended to set this option for long recordings (batch size <6, length >1h) since it is often problematic to wait 
                for service completion over an instable internet connection or from a laptop that might go into hibernation.
                The email address provided is not stored on the server. It is sometimes even advisable to kill the browser tab
                after starting the call and wait for the result emails (only for batch size <6!).
                Beware: the download link to your result(s) will be valid for 24h after you receive the email; after that
                all your data will be purged from the server.
                Disclaimer: the usage of this option is at your own risk; the key URL to download your result file will
                be send without encryption in this email; be aware that anybody who can intercept this email will be able 
                to access your result files using this key; the BAS at LMU Munich will not be held responsible for any
                security breach caused by using this email notification option.
                

boost: [true, false] 
If set to true (the default), the chunker will start by running a so-called boost phase over the recording. This boost phase uses a phoneme-based decoder instead of speech recognition. Usually, the boost option reduces processing time. On noisy input or faulty transcriptions, the boost option can lead to an increase in errors. In this case (or if a previous run with boost set to 'true' has led to chunking errors), set this option to 'false'.

except: Exception dictionary file overwriting the standard G2P output. Format: 2 semicolon-separated columns: word;transcript. Phonemes in transcript must be blank-separated. Example: sagt;z ' a x t. Note that the transcript must not contain phonemic symbols that are unknown to other services in the pipeline for the selected language; the service 'WebMAUS General' provides a list of all known symbols of a language

MINPAUSLEN: [0.0, 999.0] 
Option MINPAUSLEN: Controls the behaviour of optional inter-word silence. If set to 1,
                maus will detect all inter-word silence intervals that can be found (minimum length for a silence
                interval is then 10 msec = 1 frame). If set to values n>1, the minimum length for an inter-word
                silence interval to be detected is set to n*10 msec. For example MINPAUSLEN of 5 will cause MAUS to
                suppress inter-word silence intervals up to a length of 40msec. Since 40 msec seems to be the border of
                perceivable silence, we set this option default to 5. With other words: inter-word silences smaller
                than 50msec are not segmented but rather distributed equally to the adjacent segments. If one of the
                adjacent segments happens to be a plosive then the deleted silence interval is added totally to the
                plosive; if both adjacent segments are plosives, the interval is equally spread as with non-plosive
                adjacent segments.

forcechunking: [true, false, rescue] 
If this parameter is set to true, the chunker will run in the experimental 'forced chunking' mode (chunker option 'force'). While forced chunking is much more likely to return a fine-grained chunk segmentation, it is also more prone to chunking errors. As a compromise, you can also set this parameter to 'rescue'. In this case, the forced chunking algorithm is only invoked when the original algorithm has returned chunks that are too long for MAUS.

NOINITIALFINALSILENCE: [true, false] 
Option NOINITIALFINALSILENCE: 
                Switch to suppress the automatic modeling of an optional leading/trailing silence interval. This is
                useful, if for instance the signal is known to start with a stop and no leading silence, and the silence model would 
                'capture' the silence interval from the plosive.

InputTierName: Option InputTierName: Only needed, if TEXT is in TextGrid/ELAN format. Name of the annotation tier, that contains
                the input words/chunks.

BRACKETS: One or more pairs of characters which bracket
                        annotation markers in the input. E.g. if your input 
                        text contains markers '{Lachen}' and '[noise]' that should be passed as
                        markers and not as spoken text, set this option to '{}[]'.
                        Note that blanks replacement within such markers (see option 'WHITESPACE_REPLACEMENT')
                        only takes place in markern/comments that are defined here.
                

OUTFORMAT: [bpf, exb, csv, TextGrid, emuDB, eaf, tei, srt, sub, vtt, par] 
Option OUTFORMAT: the output format of the pipe. Note that this depends on the selected PIPE,
                more precisely, whether the last service in the pipeline supports the format; if not, an ERROR is returned. 
                Possible (selectable) formats are: 
                'TextGrid' - a praat compatible TextGrid file; 
                'bpf' - a BPF file (if the input (TEXT) is also a BPF file, the input is usually 
                copied to the output with new (or replaced) tiers); 
                'csv' - a spread sheet (CSV table) containing the most prominent tiers of the annotation; 
                'emuDB' - an Emu compatible *_annot.json file; 
                'eaf' - an ELAN compatible annotation file; 
                'exb' - an EXMARaLDA compatible annotation file; 
                'tei' - an Iso TEI document; 
                'srt' - a SubRip subtitle format file; 
                'sub' - a SubViewer subtitle format file;
                'vtt' - a WebVTT suntitle format file.
                If output format is 'vtt' and a subtitle starts with a speaker marker of the form
                '<...>', a 'v ' is inserted before the '...'.
                For a description of BPF see
                http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html.
                For a description of Emu see https://github.com/IPS-LMU/emuR.
                Note 1: using 'emuDB' will first produce only single annotation file *_annot.json;
                in the WebMAUS interface (https://clarin.phonetik.uni-muenchen.de/BASWebServices) you can process
                more than one file and than download a zipped Emu database; in this case don't forget to change the default
                name of the emuDB 'MAUSOUTPUT' using the R function emuR::rename_emuDB().
                Note 2: if you need the same result in more than one format, select 'bpf' to produce a BPF file, and then
                convert this file with the service runAnnotConv ('AnnotConv') into the desired formats.
                Note 3: some format conversions are not loss-less; select 'bpf' to be sure that no information is lost.
                

syl: [yes, no] 
Switches syllabification of the pronunciation in the KAN tier produced 
		by module G2P on; the syllable boundary marker is '.'. This option only makes sense in 
		languages in which the module G2P produces a different syllabification than the module 
		PHO2SYL (e.g. tha-TH). Otherwise use a pipe that ends with the module PHO2SYL which 
		will create tiers MAS (phonetic syllable) and KAS (phonologic syllable).
                WARNING: syl=yes causes G2P to switch off MAUS embedded mode; this might change the 
		output for some languages because the output phoneme inventar is then SAMPA and not the 
		SAMPA variant used by MAUS. Subsequent modules like MAUS might report an ERROR then.
                

ENDWORD: [0.0, 999999.0] 
Option ENDWORD: If set to a value n<999999, this option causes maus to end the
                segmentation with the word number n (word numbering in BPF starts with 0). This is useful if the input
                signal file is just a segment within a longer transcript. See also option STARTWORD.

wsync: [yes, no] 
Yes/no decision, whether each word boundary is considered as syllable boundary. Only
                relevant for phonetic transcription input from MAU, PHO, or SAP tiers (for input from the KAN tier
                this option is always set to 'yes'). If set to 'yes',
                each syllable is assigned to exactly one word index. If set to 'no', syllables can be part of more than
                one word.

UTTERANCELEVEL: [true, false] 
Switch on utterance level modelling (true); only for PIPEs with text input.
                Every TEXT input line is modelled as an utterance in an additional annotation layer ('TRL') between recording (bundle) and 
                words (ORT). This is usefull if the recording contains several sentences/utterances and you need hierarchical access 
                to these in the resulting annotation structure. For example, in EMU-SDMS output the default hierarchy bundle->ORT->MAU
                is then changed to  bundle->TRL->ORT->MAU. Note 1 : does not have any effect in CSV output. Note 2 : the use of this option 
                causes the ORT tier to contain the raw word tokens instead of the (default) word-normalized word tokens (e.g. '5,' (raw token) 
                vs. 'five' (word-normalized). 
                

featset: [standard, extended] 
Feature set used for grapheme-phoneme conversion. The standard set is the
                default and comprises a letter window centered on the grapheme to be converted. The extended set
                additionally includes part of speech and morphological analyses. The extended set is currently
                available for German and British English only. For connected text the extended feature set generally
                generally yields better performance. However, if the input comprises a high amount of proper names
                provoking erroneous part of speech tagging and morphologic analyses, than the standard feature set is
                more robust.

pos: 
                Option pos : N-HANS sample recording (RIFF WAVE *.wav) of the noise to be preserved in the signal (mode 'denoiser') or 
                the target speaker to be preserved in the signal (mode 'separator').
                The 'pos' sample is applied to all processed input signals; do not upload more than 2sec of clean signal,
                and make sure that the relevant signal is present within the very first second; 'clean signal' means that the sample should
                not contain any traces of the main voice (mode 'denoiser') nor of the 'pos' noise sample (modes 'denoiser' and 'separator'). 
                The upload of the 'pos' sample is mandatory for N-HANS mode 'separator' and optional for mode 'denoiser' (see option 'NHANS').
                

APHONE: Option APHONE: the string used to mask phonetic/phonologic labels for anonymized terms.
                If not set, the service will use the label 'nib' for masking encodings in SAMPA, and the label
                '(.)' for encodings in IPA. If set to another label, this label is used to mask in all encodings.
                

INSPROB: Option INSPROB: The option INSPROB influences the probability of deletion of segments. It
                is a constant factor (a constant value added to the log likelihood score) after each segment.
                Therefore, a higher value of INSPROB will cause the probability of segmentations with more segments go
                up, thus decreasing the probability of deletions (and increasing the probability of insertions, which
                are rarely modelled in the rule sets). This parameter has been evaluated on parts of the German
                Verbmobil data set (27425 segments) which were segmented and labelled manually (MAUS DEV set) and found
                to have its optimum at 0.0 (which is nice). Therefore we set the default value of INSPROB to 0.0.
                INSPROB was also tested against the MAUS TEST set to confirm the value of 0.0. It had an optimum at 0.0
                as well. Note that this might NOT be the optimal value for other MAUS tasks.

OUTSYMBOL: [x-sampa, ipa, manner, place] 
Option Output Encoding (OUTSYMBOL): Defines the encoding of phonetic symbols in output. If set to 'x-sampa'
                (default), phonetic symbols in output are encoded in X-SAMPA (with some minor differences in languages
                Norwegian/Icelandic in which the retroflex consonants are encoded as 'rX' instead of X-SAMPA 'X_r');
                use service runMAUSGetInventar with option LANGUAGE=sampa to get a list of symbols and
                their mapping to IPA. If set to 'ipa', the service produces UTF-8 IPA output in annotation tiers
                MAU (MAUS last module in PIPE) or in KAS/MAS (PHO2SYL last module in PIPE). 
                Just for pipes with MAUS as the last module: if set to 'manner', the
                service produces Manner of articulation for each segment; possible values are: silence, vowel,
                diphthong, plosive, nasal, fricative, affricate, approximant, lateral-approximant, ejective; if set to
                'place', the service produces Place of articulation for each segment; possible values are: silence,
                labial, dental, alveolar, post-alveolar, palatal, velar, uvular, glottal, front, central,
                back.

RULESET: MAUS rule set file; UTF-8 encoded; one rule per line; there are two different file types defined by the extension: 1. Phonological rule set without statistical information '*.nrul', synopsis is: 'leftContext-match-rightContext>leftContext-replacement-rightContext', e.g. 't,s-e:-n>t,s-@-n'. 2. Rule set with statistical information '*.rul', synopsis is: 'leftContext,match,rightContext>leftContext,replacement,rightContext ln(P(replacement|match) 0.0000', e.g. 'P9,n,@,n,#>P9,# -3.761200 0.000000'; 'P(replacement|match)' is the conditional probability that 'match' is being replaced by 'replacement'; the sum over all conditional probabilities with the same condition 'match' must be less than 1; the difference between the sum and 1 is the conditional probability 'P(match|match)', i.e. no for no change. 'leftContext/rightContext/match/replacememt' = comma separated lists of SAMPA symbols or empty lists (for *.rul the leftContext/rightContext must be exactly one symbol!); special SAMPA symbols in contexts are: '#' = word boundary between words, and '<' = utterance begin (may be used instead of a phonemic symbol); digits in SAMPA symbols must be preceded by 'P' (e.g. '2:' -> 'P2:'); all used SAMPA symbols must be defined in the language specific SAMPA set (see service runMAUSGetInventar). Examples for '*.rul' : 'P9,n,@,n,#>P9,# = 'the word final syllable /n@n/ is deleted, if preceded by /9/', '#,k,u:>#,g,u:' = 'word intial /k/ is replaced by /g/ if followed by the vowel /u:/'. Examples for '*.nrul' : '-->-N,k-' = 'insert /Nk/ at arbitrary positions', '#-?,E,s-#>#-s-#' = 'delete /?E/ in word /?Es/', 'aI-C-s,t,#>aI-k-s,t,#' = 'replace /C/ in word final syllable /aICst/ by /k/'.

maxSpeakNumber: [0.0, 999999.0] 
Option maxSpeakNumber defines a hard upper bound of the number of detected speakers. If set to 0 (default), no upper bound.
	        

allowOverlaps: [true, false] 
Option allowOverlaps: If set to true, the un-altered output of PyAnnote is
                returned in the SPD tier (note that overlaps cannot be handled by most annotation formats; only use if you really need
                to detect overlaps!); if set to false (default), overlaps, missing silence intervals etc. are resolved
                in the output tier SPD, making this output compatible with all annotation formats. The postprocessing works as follows:
                1. all silence intervals are removed. 2. all speaker segments that are 100% within another (larger) speaker segment are 
                removed. 3. If an overlap occurs the earlier segment(s) are truncated to the start of the new segment. 4. all remaining 
                gaps in the segmentation are filled with silence intervals.
                

minchunkduration: [0.0, 999999.0] 
Lower bound for output chunk duration in seconds. Note that the chunker does not guarantee an upper bound on chunk duration.

SIGNAL: Mandatory parameter SIGNAL: mono sound file or video file 
                containing the speech signal to be processed; PCM 16 bit resolution; any
                sampling rate. Although the mimetype of this input file is restricted to RIFF AUDIO 
                audio/x-wav (extension wav), most pipes will also process NIST/SPHERE (nis|sph) and 
                video (mp4|mpeg|mpg|avi|flv).

stress: [yes, no] 
yes/no decision whether or not word stress is to be added to the canonical
                transcription (KAN tier). Stress is marked by a single apostroph (') that is inserted before the syllable nucleus 
                into the transcription.

imap: Customized mapping table from orthography to phonology. If pointing to a valid mapping table, the pipeline service will automatically set the LANGUAGE option for service G2P to 'und' (undefined) while leaving the commandline option LANGUAGE for the remaining services unchanged (most likely 'sampa'). This mapping table is used then to translate the input text into phonological symbols. See https://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/readme_g2p_mappingTable.txt for details about the format of the mapping table.

MODUS: [default, standard, align] 
Option MODUS: Operation modus of MAUS: default is to use the language dependent default modus; the 
                two possible modi are: 'standard' which is the segmentation and
                labelling using the MAUS technique as described in Schiel ICPhS 1999, and 'align', a forced
                alignment is performed on the input SAM-PA string defined in the KAN tier of the BPF (the same effect
                as the deprecated former option CANONLY=true).

RELAXMINDUR: [true, false] 
Option Relax Min Duration (RELAXMINDUR) changes the default minimum duration of 30msec for consonants 
                and short/lax vowels and of 40msec for tense/long vowels and diphthongs to 10 and 20msec respectively. 
                This is not optimal for general segmentation because MAUS will start to insert many very short 
                vowels/glottal stops where they are not appropriate. But for some special investigations 
                (e.g. the duration of /t/) it alleviates the ceiling problem at 30msec duration.
                

ATERMS: Option ATERMS: file encoded in UTF-8 containing the terms that are to 
                be anonymized by the service. One term per line; terms may contain blanks, in which case only 
                consecutive occurances of the words within the term are anonymized.
                

RELAXMINDURTHREE: [true, false] 
Alternative option to Relax Min Duration (RELAXMINDUR): changes the minimum duration for all models to 3 states 
                (= 30msec with standard frame rate)to 30msec.
                This can be useful when comparing the duration of different phone groups.
                

STARTWORD: [0.0, 999999.0] 
Option STARTWORD: If set to a value n>0, this option causes maus to start the
                segmentation with the word number n (word numbering in BPF starts with 0). This is useful if the input
                signal file is just a segment within a longer transcript. See also option ENDWORD.

INSYMBOL: [sampa, ipa] 
Option INSYMBOL: Defines the encoding of phonetic symbols in input. If set to 'sampa'
                (default), phonetic symbols are encoded in X-SAMPA (with some coding differences in Norwegian/Icelandic);
                use service runMAUSGetInventar with option LANGUAGE=sampa to get a list of symbols and
                their mapping to IPA). If set to 'ipa', the service expects blank-separated UTF-8 IPA. 
                

PRESEG: [true, false] 
Option PRESEG: If set to true, a pre-segmentation using the wav2trn tool is done by the
                webservice on-the-fly; this is useful, if the input
                signal (or processed chunks within the signal) has leading and/or trailing silence. 
                

AWORD: Option AWORD: the string used to mask word labels for anonymized terms.
                

USETRN: [true, false, force] 
Option USETRN: If the pipe produces/processes a chunk segmentation (CHUNKER/CHUNKPREP), 
                this option is set automatically. If set to true, MAUS searches the input BPF for a TRN tier
                (turn/chunk segmentation, see http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatsdeu.html#TRN). The
                synopsis for a TRN entry is: 'TRN: (start-sample) (duration-sample) (word-link-list) (label)', e.g.
                'TRN: 23654 56432 0,1,2,3,4,5,6 sentence1' (the speech within the recording 'sentence1' starts with
                sample 23654, last for 56432 samples and covers the words 0-6). If only one TRN entry is found, the
                segmentation is restricted within a time range given by this TRN tier entry; this is useful, if there
                exists a reliable pre-segmentation of the recorded utterance, i.e. the start and end of speech within
                the recording is known. If more than one TRN entry is found, the webservice performs an segmentation
                for each 'chunk' defined by a TRN entry and aggregates all individual results into a single results
                file; this is useful if the input consists of long recordings, for which a manual chunk segmentation is
                available. If USETRN is set to 'force' (deprecated since maus 4.11; use PRESEG=true instead!, 
                a pre-segmentation using the wav2trn tool is done by the
                webservice on-the-fly; this is useful, if the input BPF does not contain a TRN entry and the input
                signal has leading and/or trailing silence.

MAUSSHIFT: Option MAUSSHIFT: If set to n, this option causes the calculated MAUS segment boundaries
                to be shifted by n msec (default: 0) into the future. Most likely this systematic shift is caused by a
                boundary bias in the training material's segmentation.
                The default should work for most cases.

HIGHF: [0.0, 30000.0] 
Option HIGHF: upper filter edge in Hz. If set >0Hz and LOWF is 0Hz, a low pass filter 
                with HIGHF Hz is applied; if set >0Hz and LOWF is set lower than HIGHF, a band
                pass between LOWF and HIGHF is applied; if set >0Hz and LOWF is set higher than 0Hz but higher 
                than HIGHF, a reject band pass between HIGHF and LOWF is applied. 
                E.g. HIGHF = 3000 LOWF = 300 is telephone band; HIGHF = 45 LOWF = 55 filters out a 50Hz hum.
                

silenceonly: [0.0, 999999.0] 
If set to a value greater than 0, the chunker will only place chunk boundaries in regions where it has detected a silent interval of at least that duration (in ms). Else, silent intervals are prioritized, but not to the exclusion of word boundaries without silence. On speech that has few silent pauses (spontaneous speech or speech with background noise), setting this parameter to a number greater than 0 is likely to hinder the discovery of chunk boundaries. On careful and noise-free speech (e.g. audio books) on the other hand, setting this parameter to a sensible value (e.g. 200) may reduce chunkin errors.

boost_minanchorlength: [2.0, 8.0] 
If you are using the boost phase, you can set its minimum anchor length independently of the general minimum anchor length. Setting this parameter to a low value (e.g. 2-3) means that the boost phase has a greater chance of finding preliminary chunk boundaries, which is essential for speeding up the chunking process. On the other hand, high values (e.g. 5-6) lead to more conservative and more reliable chunking decisions. If boost is set to false, this option is ignored.

ADDSEGPROB: [true, false] 
Option Add Viterbi likelihoods (ADDSEGPROB) causes that the frame-normalized natural-log total 
                Viterbi likelihood of an aligned segment is appended to the segment label in the 
                output annotation (the MAU tier). This might be used as a 'quasi quality measure' 
                on how good the acoustic signal in the aligned segment has been modelled by the 
                combined acoustical and pronunciation model of MAUS. Note that the values are not 
                probabilities but likelihood densities, and therefore are not comparable for 
                different signal segments; they are, however, comparable for the same signal segment. 
                Warning: this option breaks the BPF standard for the MAU tier and must not be 
                used, if the resulting MAU tier should be further processed, e.g. in a pipe).
                Implemented only for output phoneme symbol set SAMPA (default).
		

Output: A XML response containing the elements "success", "downloadLink", "output" and "warning".
                "success" states if the processing was successful or not, "downloadLink" specifies the location where
                the output file of the pipeline can be found (the format of the file depends on the option selected in OUTFORMAT),
                "output" contains the output that is mostly useful during debugging errors and "warning" lists warnings, if any 
                occured during the processing. Depending on input parameter OUTFORMAT the output file in "downloadlink" can be
                of several different file formats; see mandatory parameter OUTFORMAT for details.
                
----------------------------------------------------------------
----------------------------------------------------------------
runTextEnhance
------------------
Description: This service reads an arbitrary encoded text file and returns a normalized UTF-8
            UNIX style text file that is suitable for processing within the BAS WebServices.
            It also allows to map bracketted markers (e.g. '{Laughing loud}') in the input text to a form that is 
            recognized (and passed through) by the BAS WebServices (e.g. '<{Laughing_loud}>').

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F left-bracket=# -F replace-whitespace-char= -F infile=@<filename> -F brackets= 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runTextEnhance'

Parameters:  [left-bracket] [replace-whitespace-char] infile [brackets]

Parameter description: 
left-bracket: One or more characters which mark comments reaching
                        until the end of the line (default: '#'). E.g. if your 
                        input text contains comment lines that begin with ';', 
                        set this option to ';' to avoid that these comments are
                        treated as spoken text. If you want to suppress the default
                        '#' comment character, set this option to 'NONE'.
                        If you are using comment lines in your input text, you must be absolutely 
                        sure that the comment character appears nowhere in the text except in comment lines!
                        Note 1: the characters '&', '|' and '=' do not work as comment characters.
                        Note 2: for technical reasons the value for this option cannot be empty.
                        Note 3: the default character '#' cannot be combined with other characters, e.g. if
                        you define this option as ';#', the '#' will be ignored.
                        Note 4 (sorry): for the service 'Subtitle' comment lines must be terminated with
                        a so called 'final punctuation sign', i.e. one of '.!?:…'; otherwise, an immediately
                        following speaker marker will not be recognized.
                

replace-whitespace-char: The character that whitespace in comments should be
                        substituted by (default: none). The BAS WebService G2P requires 
                        that annotation markers or comment lines in input texts do not 
                        contain white spaces. This option let you decide which character
                        should be used to replace the white spaces. For further processing 
                        in G2P it is recommended to set this option to '_'.
                        If set to the string 'NONE' no replacement takes place (default).
                        CAUTION: the characters '&' and '=' do not work as replacements.
                

infile: Input text file. Most common formats and encodings will be recognized automatically.
                        

brackets: One or more pairs of characters which bracket
                        annotation markers in the input. E.g. if your input 
                        text contains markers '{Lachen}' and '[noise]' that should be passed as
                        markers and not as spoken text, set this option to '{}[]'.
                        Note that blanks replacement within such markers (see next option 'replace-whitespace-char')
                        only takes place in markern/comments that are defined here.
                

Output: An XML response containing the tags "success", "downloadLink", "output" and 
                        "warning". "success" states whether the processing was successful or not, "downloadLink" 
                        specifies the location where the output file is provided; depending on parameter 
                        'outformat' this can be BPF file (*.par), a SubRip subtitle format (*.srt), or a SubViewer 
                        subtitle format (*.sub). The BPF contains the content of the input BPF (option "bpf") with 
                        appended TRO and TRN tier (existing TRO/TRN tiers in the BPF input are over-written). 
                        The TRO tier contains the mapping from the ORT tier to the input transcription; the TRN tier 
                        contains the subtitle grouping.
                
----------------------------------------------------------------
----------------------------------------------------------------
runVoiceActivityDetection
------------------
Description: This service automatically segments the input signal into speech and silence intervals. The result is a simple annotation file with one segmentation layer (called 'VAD'). The algorithm is a keras based DNN to classify each signal frame (10msec) into speech or silence, followed by a smoothing stage that accumulates silence/speech frames into more realistic stretches of silence (labelled wuth 'p:') and speech (labelled with 'speech'); see options 'aggressivity', 'minSilence' and 'minVoice' to influence this process. Output labels can be augmented by a confidence measure about the decision (option 'showConfidence').

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F input=@<filename> -F minVoice=100 -F minSilence=100 -F aggressivity=50 -F showConfidence=false -F outformat=bpf 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runVoiceActivityDetection'

Parameters:  input [minVoice] [minSilence] [aggressivity] [showConfidence] [outformat]

Parameter description: 
input: Input file containing the speech signal. All media formats that AudioEnhance supports.

minVoice: [0.0, 9999.0] 

                        Minimum length of a speech interval. Shorter detected
                        speech intervals are removed. If set to a large value,
                        the output will contain one single silence segment.
                

minSilence: [1.0, 999999.0] 

                   Minimum length of a silence interval. Shorter detected
                        silence intervals are removed. If set to a large
                        value, the output will contain one single speech
                        segment.
                

aggressivity: [1.0, 999999.0] 

                   Aggressivity in percent of classification (1-99).
                        Higher values make the resulting classification more
                        prone to classifying silence. Smaller values make it
                        more prone to classifying voice. Technically, this
                        value is the threshold for the output probabilities of the
                        DNN to decide for silence (DNN probability higher than this
                        value) or for speech (DNN probability lower) regarding a 
                        single speech frame. Do not change this from the default of 
                        50 (prob. is 0.5), if you are also calculating confidence measures,
                        because other values than 50 will grossly distort the confidence 
                        measures.
                

showConfidence: [true, false] 
If set to true, the labels are augmented by a ';' followed by a confidence
                   measure that expresses the mean confidence of the classification. Technically, the 
                   confidence measure is the distance of the frame-wise DNN output probability to the 
                   threshold (see option 'aggessivity') averaged over all frames of the segment. 
                

outformat: [bpf, csv, eaf, emuDB, exb, TextGrid, tei] 
Output format: default is BAS Partitur Format; the tier name is VAD (see
                        http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html for details
			about the BPF format); other supported annotation formats are: all formats
			supported by BAS WebService 'AnnotConv'.
			Silence intervals are labelled
                        with 'p:', speech intervals with 'speech'.

Output: An XML response containing the tags 'success', 'downloadLink', 'output' and 
                        'warning'. 'success' states whether the processing was successful or not, 'downloadLink' 
                        specifies the location where the output file is provided; depending on parameter 
			'outformat' this can be BPF file (*.par) or any other format supported by 
			BAS WebService AnnotConv.
			The output annotation contains the content of the input BPF (option 'bpf') 
			together with the appended VAD tier.
                
----------------------------------------------------------------
----------------------------------------------------------------
runASRGetQuota
------------------
Description: Returns a XML element 'basQuota' with four sub-elements: 'ASRType' : the value of parameter 'ASRType (see below); 'secsAvailable' : the still available ASR quota for the ASR service in secs (0 if monthly quotas has been expired, 999999 if umlimited); 'monthlyQuota' : the monthly quota in secs (999999 if umlimited quota); 'error' : the error message of the backend script (if empty, no error occurred).

Example curl call is:
curl -v -X GET -H 'content-type: application/x-www-form-urlencoded' 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runASRGetQuota?ASRType='

Parameters:  [ASRType]

Parameter description: 
ASRType: The name of the ASR service (name of backend script, e.g. 'callGoogleASR') for which the quotas are requested. Values are: callAmberscriptASR, callEMLASR, callFraunhoferASR, callGoogleASR, callLSTDutchASR, callLSTEnglishASR, callWatsonASR.
                

Output: The number of sec free quota time as string.
                
----------------------------------------------------------------
----------------------------------------------------------------
getLoadIndicatorXML
------------------
Description: Returns an indicator how high the server load is - 0 (for low load, i.e., less than 50 percent load), 
            1 (for middle load, i.e., between 50 and 100 percent load), and 2 (for high load, i.e., more than 100 percent load).
            Additionally the last 20 values of the un-normalized server load in 20sec steps is returned in a list.
            

Example curl call is:
curl -v -X GET -H 'content-type: multipart/form-data' 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/getLoadIndicatorXML'


Parameter description: 
Output: XML snippet that characterises the load of the server.
----------------------------------------------------------------
----------------------------------------------------------------
runPho2Syl
------------------
Description: Syllabification of canonical (phonological) or spontanous (phonetic) speech transcriptions.

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F wsync=no -F lng=deu-DE -F tier=MAU -F rate=0 -F outsym=sampa -F i=@<filename> -F oform=bpf 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runPho2Syl'

Parameters:  [wsync] [lng] [tier] [rate] [outsym] i [oform]

Parameter description: 
wsync: [yes, no] 
Yes/no decision, whether each word boundary is considered as syllable boundary. Only
                relevant for phonetic transcription input from MAU, PHO, or SAP tiers (for input from the KAN tier
                this option is always set to 'yes'). If set to 'yes',
                each syllable is assigned to exactly one word index. If set to 'no', syllables can be part of more than
                one word.

lng: [aus-AU, afr-ZA, sqi-AL, eus-ES, eus-FR, cat-ES, cze-CZ, nld-NL, eng-AU, eng-GB, eng-NZ, eng-US, ekk-EE, fin-FI, fra-FR, kat-GE, deu-DE, gsw-CH, gsw-CH-BE, gsw-CH-BS, gsw-CH-GR, gsw-CH-SG, gsw-CH-ZH, hat-HT, hun-HU, isl-IS, ita-IT, jpn-JP, gup-AU, ltz-LU, mlt-MT, nor-NO, fas-IR, pol-PL, ron-RO, rus-RU, slk-SK, spa-ES, swe-SE, tha-TH, guf-AU, und] 
RCFC5646 locale language code of the speech to be syllabified; defines the possible SAMPA
                phoneme symbol set in input; we use the RFC5646 sub-structure 'iso639-3 - iso3166-1 [ - iso3166-2],
                e.g. 'eng-US' for American English, 'deu-AT-1' for Austrian German spoken in 'Oberoesterreich';
                alternatively, Iso 639-3 char language code is supported; non-standard language codes:. 'nze' stands for New Zealand English,
                'use' for American English. 'und' (undefined) can be used to syllabify X-SAMPA input independent of 
                a language (experimental).

tier: [KAN, MAU, PHO, SAP] 
Name of tier in the input BPF file, whose content is to be syllabified. Currently only
                the BPF tiers MAU, PHO, SAP (producing a new tier MAS), and KAN (producing a new tier KAS)
                are supported; phonemic encodimng must be (X-)SAMPA. Tier KAN contains the canonical phonological transcript usually created by
                the service G2P. Tier MAU contains the phontic transcript (and segmentation) generated by service MAUS. 
                Tiers PHO and SAP contain different phonetic segmentation formats based on extended (X-)SAMPA,
                e.g. with symbols indicating morpheme boundaries and deviations from the canonical
                pronunciation. The Non-SAMPA content of PHO and SAP is removed before syllabification.

rate: [0.0, 999999.0] 
Option rate: Only needed for TextGrid output format, if the input BPF does not contain this information (header entry 'SAM:'). Sample rate is needed to convert sample values
                from BAS Partitur Format to seconds in TextGrid.

outsym: [sampa, ipa] 
Ouput phoneme symbol inventory. Default is X-SAMPA ('sampa') compatible to the input encoding; alternative is IPA ('ipa') encoded in UTF-8 (not BPF conform!).

i: Input BAS Partitur Format file (BPF) containing the tier (specified by
                the input parameter 'tier') to be syllabified. See http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html for a detailed
                description of BPF.

oform: [bpf, exb, csv, tg, emuDB, eaf, tei] 
Output format: 'bpf' is BAS Partitur Format (see
                http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html for a detailed 
                description of BPF); 'tg' is praat TextGrid (see http://www.praat.org);
                'csv' is a spreadsheet table; 'emuDB' is an annotation file of the EMU-SDMS
		(see https://ips-lmu.github.io/EMU.html); eaf is a ELAN EAF; tei is a Iso TEI document.
	        

Output: A BAS Partiture or TextGrid file additionally containing the syllabified output. If the
                input parameter 'tier' is set to MAU, SAP, or PHO, a tier MAS is generated that contains for each
                syllable analogously to tier MAU the time onset, duration, word index, and syllable string information. If
                'tier' is set to KAN, a tier KAS is generated that contains for each word analogously to KAN the
                canonical transcription with phonemes segmented by blanks and syllables separated by
                dots.
----------------------------------------------------------------
----------------------------------------------------------------
runFormantAnalysis
------------------
Description: This services reads pair(s) of signal + annotation files of a single speaker, creates an EMU database with a phonetic segmentation, and performs an automatic formant analysis on selected vowels.  The formant analysis is performed only in the webinterface version of this service; therefore this service has no public REST interface.
              SIGNAL must be one of wav,nis,nist,sph,mp4,mpeg,mpg;
              TEXT must be one of txt,par,TextGrid,eaf,csv,_annot.json;
              OUTFORMAT is accepted by the script but ignored (for comptibility with 
              Web API: the Web API needs this to allow the assemblence of a emuDB)
              The service calls runPipeline with 
              PIPE=G2P_MAUS_PHO2SYL, if both, SIGNAL and TEXT, are given, or
              PIPE=G2P_CHUNKER_MAUS_PHO2SYL, if the number of words in TEXT exceeds 3000 words, or
              PIPE=CHUNKPREP_G2P_MAUS_PHO2SYL, if TEXT is of type TextGrid|EAF|CSVi, or
              simply assembles the input *_annot.json files in an emuDB (without checking the structure!).
              runPipeline is called with the following options (other than defaults):
              OUTFORMAT=emuDB
              diarization=true
              [InputTierName=InputTierName]
              The service accepts options for the formant analysis which are ignored by the RESTful service but passed on to the bpf-to-emuDB converter of the web-interface.
              This service only makes sense when called from the BAS WebService Web API; therefore this service cannot be called as a RESTful service like other BAS services.
            

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F SIGNAL=@<filename> -F LANGUAGE=deu-DE -F imap=@<filename> -F gender=unknown -F TEXT=@<filename> -F emuRDBname=FORMANTANALYSISOUTPUT -F sounds=a:,e:,i:,o: -F computeERatio=false -F midpoint=false -F InputTierName=unknown -F outlierMetric=euclid -F outlierThreshold=250 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runFormantAnalysis'

Parameters:  SIGNAL [LANGUAGE] [imap] [gender] TEXT [emuRDBname] [sounds] [computeERatio] [midpoint] [InputTierName] [outlierMetric] [outlierThreshold]

Parameter description: 
SIGNAL: Mandatory parameter SIGNAL: mono sound file or video file containing the speech signal to be processed; PCM 16 bit resolution; any
                sampling rate. Although the mimetype of this input file is restricted to RIFF AUDIO 
                audio/x-wav (extension wav), NIST/SPHERE (nis|nist|sph), video (mp4|mpeg|mpg).

LANGUAGE: [aus-AU, afr-ZA, sqi-AL, arb, eus-ES, eus-FR, cat-ES, nld-NL, nld-NL-GN, nld-NL-OH, nld-NL-PR, eng-US, eng-AU, eng-GB, eng-GB-OH, eng-GB-OHFAST, eng-GB-LE, eng-SC, eng-NZ, ekk-EE, fin-FI, fra-FR, kat-GE, deu-AT, deu-CH, deu-DE, deu-DE-OH, gsw-CH-BE, gsw-CH-BS, gsw-CH-GR, gsw-CH-SG, gsw-CH-ZH, gsw-CH, hun-HU, isl-IS, ita-IT, jpn-JP, gup-AU, sampa, ltz-LU, mlt-MT, nor-NO, fas-IR, pol-PL, ron-RO, rus-RU, spa-ES, swe-SE, tha-TH, guf-AU] 

                Language: RCFC5646 locale code of the processed speech; defines the phoneme set of input and the orthographic system of input text (if any);
                we use the RFC5646 sub-structure 'iso639-3 - iso3166-1 [ - iso3166-2], e.g. 'eng-US' for American English, 'deu-AT-1' for Austrian German spoken in
                'Oberoesterreich'; the code 'sampa' ('Language independent') allows the user to upload a customized
                mapping from orthographic to phonologic form (see option 'imap'). Special languages: 'gsw-CH' denotes
                text written in Swiss German 'Dieth' transcription (https://en.wikipedia.org/wiki/Swiss_German); 'gsw-CH-*' are localized varieties in larger Swiss cities;
                'jpn-JA' (Japanese) accepts Kanji or Katakana or a mixture of both, but the tokenized output will contain only the Katakana version of the input;
                'aus-AU' (Australian Aboriginal languages, including Kunwinjku, Yolnu Matha) accept so called 'Modern Practical Orthography'
                (https://en.wikipedia.org/wiki/Transcription_of_Australian_Aboriginal_languages);
                'fas-IR' (Persian) accepts a romanized version of Farsi developped by Elisa Pellegrino and Hama Asadi (see
                http://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/PersianRomanizationTable.pdf for details);
                'arb' is a macro language covering all Arabic varieties; the input must be encoded in a broad phonetic romanization developped
                by Jalal Tamimi and colleagues (see http://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/TamimiRomanization.pdf for details).
                The language code is passed to all services of the pipeline, thus influencing the way these services will
                process the speech. If one member of the PIPE does not support the language, the service will try to determine another 
                suitable language (WARNING is issued) or, if that is not possible, an ERROR is returned. 
                If the service does currently not support the language, an ERROR is returned. 
                

imap: Only needed if the option 'Language' is set to 'Independent' (undefined); then you must provide a G2P mapping table from orthography to phonology through this option. This mapping table is used to translate the input text into phonological symbols. See https://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/readme_g2p_mappingTable.txt for details about the format of the mapping table.

gender: [unknown, female, male] 
Gender of speaker; if set to 'unknown' the service will determine the gender from the fundamental frequency.

OUTFORMAT: Option OUTFORMAT: the output format: always 'emuDB (= a *_annot.json file).
                This parameter is only here to signal the Web API that there is emuDB output; it cannot be changed via this API.
                

TEXT: Optional parameter TEXT: The (optional) textual input corrsponding to SIGNAL; usually some form of text or transcript. If this option is omitted, the service will apply automatic transcription using the runASR service. The input can be a plain text (txt), a praat TextGrid (TextGrid), or a BAS Partitur Format (par) file. See
                http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html for detailed description of the
                BPF.

emuRDBname: This option sets the data base name of the output EMU database.
                

sounds: The list of extracted and analysed vowels; use SAMPA symbols as defined for the selected language and separate each symbol by a comma; for a list of valid SAMPA symbols in the chosen language click on the button 'Show inventory' left of the option 'Language'.

computeERatio: [true, false] 
If set, for each sound token the eRatio to all other vowel group centers is calculated and output in a separate table. Beware: if the number of sound classes is more than 4 (see option 'List of sounds') this can take a very long time and produce a very large table.

midpoint: [true, false] 
If set, the formant value of a sound token will be taken from the exact midpoint of each token track, not averaged from the 50% midpoint section of the track.

InputTierName: Only needed, if TEXT is in TextGrid|EAF format: the name of the annotation tier, that contains
                the input chunk segmentation.

outlierMetric: [euclid, mahalanobis] 
The metric, euclid or mahalanobis, used to calculated outlier distances from the vowel group center.

outlierThreshold: The threshold for removing sound tokens as outliers: if the distance of the token in formant space is larger than this value in Hz, the token is treated as a outlier and not shown in output plots/tables that have a 'NoOutliers' in the file name.

Output: A XML response containing the elements "success", "downloadLink", "output" and "warning".
                "success" states if the processing was successful or not, "downloadLink" specifies the location where
                result file (emuDB annotation) can be found (the format of the file depends on the option selected in OUTFORMAT),
                "output" contains the output that is mostly useful during debugging errors and "warning" lists warnings, if any 
                occured during the processing. 
                
----------------------------------------------------------------
----------------------------------------------------------------
runChannelSeparator
------------------
Description: This services reads a RIFF WAVE audio file with 2 or more channels that each contains the recording
              of a speaker and performs a version of Volker Dellwo's 'Frankensteins Channel Segregator', and then returns the 
              resulting audio file with the same number of channels that are now completely separated from each other.
              Complete separation here means that only one channel (the channel of the dominant speaker) contains a signal at 
              every time point while all other channels are muted. If this is done without errors, the channels then contain
              only cross talk of other speakers when speakers overlap, but the usual crosstalk when more than one speaker
              are being recorded in the same acoustic environment and only one speaker speaks is gone. This can have a positive effect on automatic
              speech recognition.
            

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F WINLEN=16 -F NONORM=false -F WAV=@<filename> 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runChannelSeparator'

Parameters:  [WINLEN] [NONORM] WAV

Parameter description: 
WINLEN: [1.0, 200.0] 
Length of window in msec in which the short time energy is calculated (frame length) for channel comparison.
                

NONORM: [true, false] 
Switches off channel normalisation. if set true, the default channel normalisation is switched off. The channel normalisation
                tries to make the channels comparable with each other, even if one channel has a much lower recording gain than the others. This improves the 
                agorithm's ability to decide in each time point which speaker is the one who is currently speaking.
                

WAV: The input RIFF WAVE audio file to be processed. Only multi-channel audio files are accepted;
                each found channel is assumend to contain the microphone signal assigned to one speaker.
                

Output: A XML response containing the elements "success", "downloadLink", "output" and "warning".
                "success" states if the processing was successful or not, "downloadLink" specifies the location where
                the output annotation file can be found, "output" contains the output that is mostly useful during 
                debugging errors and "warning" lists warnings, if any occured during processing. 
                
----------------------------------------------------------------
----------------------------------------------------------------
runEMUMagic
------------------
Description: This services reads either single signal file(s) or a pair(s) of signal + annotation and creates a emuDB with the best achievable annotation.
              SIGNAL can be one of wav,nis,nist,sph,mp4,mpeg,mpg;
              TEXT can be one of txt,par,TextGrid,eaf,csv;
              Output is always a single _annot.json file with the same base name as
              SIGNAL or as defined in OUT.
              OUTFORMAT is accepted by the script but ignored (for comptibility with 
              Web API: the Web API needs this to allow the assemblence of a emuDB)
              The only option is LANGUAGE and imap.
              The script calls runPipeline with 
              PIPE=G2P_MAUS_PHO2SYL, if both, SIGNAL and TEXT, are given, or
              PIPE=ASR_G2P_MAUS_PHO2SYL, if only SIGNAL is given, or
              PIPE=[ASR_]G2P_CHUNKER_MAUS_PHO2SYL, if the number of words in TEXT or the ASR result exceeds 3000 words, or
              PIPE=CHUNKPREP_G2P_MAUS_PHO2SYL, if TEXT is of type TextGrid|EAF|CSV.
              runPipeline is called with the following options (other than defaults):
              OUTFORMAT=emuDB
              ASRType=autoSelect
              diarization=true
              [InputTierName=InputTierName]
              This service is restricted for academic use only; therefore this service cannot be called as a RESTful
            service like other BAS services, and the Web API to this service is protected by AAI Shiboleth authentification.
            

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F SIGNAL=@<filename> -F LANGUAGE=deu-DE -F imap=@<filename> -F TEXT=@<filename> -F emuRDBname=MAUSOUTPUT -F InputTierName=unknown 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runEMUMagic'

Parameters:  SIGNAL [LANGUAGE] [imap] [TEXT] [emuRDBname] [InputTierName]

Parameter description: 
SIGNAL: Mandatory parameter SIGNAL: mono sound file or video file containing the speech signal to be processed; PCM 16 bit resolution; any
                sampling rate. Although the mimetype of this input file is restricted to RIFF AUDIO 
                audio/x-wav (extension wav), NIST/SPHERE (nis|nist|sph), video (mp4|mpeg|mpg).

LANGUAGE: [aus-AU, afr-ZA, sqi-AL, arb, eus-ES, eus-FR, cat-ES, nld-NL, nld-NL-GN, nld-NL-OH, nld-NL-PR, eng-US, eng-AU, eng-GB, eng-GB-OH, eng-GB-OHFAST, eng-GB-LE, eng-SC, eng-NZ, ekk-EE, fin-FI, fra-FR, kat-GE, deu-AT, deu-CH, deu-DE, deu-DE-OH, gsw-CH-BE, gsw-CH-BS, gsw-CH-GR, gsw-CH-SG, gsw-CH-ZH, gsw-CH, hun-HU, isl-IS, ita-IT, jpn-JP, gup-AU, sampa, ltz-LU, mlt-MT, nor-NO, fas-IR, pol-PL, ron-RO, rus-RU, spa-ES, swe-SE, tha-TH, guf-AU] 

                Language: RCFC5646 locale code of the processed speech; defines the phoneme set of input and the orthographic system of input text (if any);
                we use the RFC5646 sub-structure 'iso639-3 - iso3166-1 [ - iso3166-2], e.g. 'eng-US' for American English, 'deu-AT-1' for Austrian German spoken in
                'Oberoesterreich'; the code 'sampa' ('Language independent') allows the user to upload a customized
                mapping from orthographic to phonologic form (see option 'imap'). Special languages: 'gsw-CH' denotes
                text written in Swiss German 'Dieth' transcription (https://en.wikipedia.org/wiki/Swiss_German); 'gsw-CH-*' are localized varieties in larger Swiss cities;
                'jpn-JA' (Japanese) accepts Kanji or Katakana or a mixture of both, but the tokenized output will contain only the Katakana version of the input;
                'aus-AU' (Australian Aboriginal languages, including Kunwinjku, Yolnu Matha) accept so called 'Modern Practical Orthography'
                (https://en.wikipedia.org/wiki/Transcription_of_Australian_Aboriginal_languages);
                'fas-IR' (Persian) accepts a romanized version of Farsi developped by Elisa Pellegrino and Hama Asadi (see
                http://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/PersianRomanizationTable.pdf for details);
                'arb' is a macro language covering all Arabic varieties; the input must be encoded in a broad phonetic romanization developped
                by Jalal Tamimi and colleagues (see http://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/TamimiRomanization.pdf for details).
                The language code is passed to all services of the pipeline, thus influencing the way these services will
                process the speech. If one member of the PIPE does not support the language, the service will try to determine another 
                suitable language (WARNING is issued) or, if that is not possible, an ERROR is returned. 
                

imap: Customized mapping table from orthography to phonology. If the option 'Language' is set to 'Independent' (undefined), you must provide a G2P mapping table through this option. This mapping table is then used to translate the input text into phonological symbols. See https://www.bas.uni-muenchen.de/Bas/BASWebServices/DOCS/readme_g2p_mappingTable.txt for details about the format of the mapping table.

OUTFORMAT: Option OUTFORMAT: the output format: always 'emuDB (= a *_annot.json file).
                This parameter is only here to signal the Web API that there is emuDB output; it cannot be changed via this API.
                

TEXT: Optional parameter TEXT: The (optional) textual input corrsponding to SIGNAL; usually some form of text or transcript. If this option is omitted, the service will apply automatic transcription using the runASR service. The input can be a plain text (txt), a praat TextGrid (TextGrid), or a BAS Partitur Format (par) file. See
                http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html for detailed description of the
                BPF.

emuRDBname: This option sets the data base name in the output EMU database.
                

InputTierName: Only needed, if TEXT is in TextGrid|EAF format: the name of the annotation tier, that contains
                the input chunk segmentation.

Output: A XML response containing the elements "success", "downloadLink", "output" and "warning".
                "success" states if the processing was successful or not, "downloadLink" specifies the location where
                the result emuDB annotation file can be found,
                "output" contains the output that is mostly useful during debugging errors and "warning" lists warnings, if any 
                occured during the processing. 
                
----------------------------------------------------------------
----------------------------------------------------------------
runAnonymizer
------------------
Description: This services reads a signal file (sound, video) + BAS Partitur Format annotation + a 
              list of terms to be anonymized in both inputs, masks all occurances in the signal and in the 
              annotation, and returns the two anonymized files in a ZIP archive; or just the anonymized 
              annotation in a ZIP file, if ANNOTONLY=true.
              SIGNAL can be one of wav,nis,nist,sph,mp4,mpeg,mpg,avi,fvl or can be omitted (if ANNOTONLY=true);
              BPF must be a BPF file *.par with at least the ORT tier and one of MAU,SAP,PHO tiers (see
              https://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html#Partitur for details regarding the BPF);
              Output is either a ZIP (default) containing the masked (by noise or a beep or silence, see ASIGNAL) signal 
              (sound files keep the same properties as input, while video input is re-coded into MP4 with h264 and 
              aac encoding), and the input annotation (in a format given by the option OUTFORMAT) with all word 
              label occurances replaced by the string given in option AWORD (default: 'ANONYMIZED') and all 
              phonetic label occurances replaced by the string 
              given in APHONE (default: '<nib>' for SAMPA, '(.)' for IPA encodings); or the output ZIP contains
              just the anonymized annotation file (ANNOTONLY=true).
              The (required) input list of terms to be anonymized (ATERMS) must be encoded in UTF-8 and have one term per line; 
              terms may contain blanks, in which case only consecutive occurances of the words within the term are 
              anonymized.
            

Example curl call is:
curl -v -X POST -H 'content-type: multipart/form-data' -F SIGNAL=@<filename> -F ASIGNAL=brownNoise -F OUTFORMAT=bpf -F rate=1 -F AWORD=ANONYMIZED -F ANNOTONLY=false -F BPF=@<filename> -F APHONE= -F ATERMS=@<filename> 'https://clarin.phonetik.uni-muenchen.de/BASWebServices/services/runAnonymizer'

Parameters:  SIGNAL [ASIGNAL] [OUTFORMAT] [rate] [AWORD] [ANNOTONLY] BPF [APHONE] [ATERMS]

Parameter description: 
SIGNAL: Optional input SIGNAL: sound or video file containing the speech signal to 
                be anonymized. Although the mimetype of this input file is restricted to RIFF AUDIO 
                audio/x-wav (extension wav), NIST/SPHERE (nis|nist|sph) and video (mp4|mpeg|mpg|avi|fvl)
                are accepted.

EQUALNAMES: Option EQUALNAMES: If set to true, the output annotation file BPF has the same basename 
                as the SIGNAL file; this is useful e.g. in a pipe

ASIGNAL: [brownNoise, beep, silence] 
Option ASIGNAL: the type of signal to mask anonymized terms in the signal.
                'brownNoise' is brown noise; 'beep' is a 500Hz sinus; 'silence' is total silence (zero signal); masking signals have an amplitude of -10dB of the maximum amplitude
                and are faded in and out with a very short sinoid function. 
                

OUTFORMAT: [bpf, exb, csv, TextGrid, emuDB, eaf, tei, par] 
Option 'Output format' (OUTFORMAT): the output format of the anonymized input BPF. 
                TextGrid - a praat compatible TextGrid file; 
                bpf - the input BPF file with now anonymized tiers; csv - a spreadsheet
                (CSV table) that contains most input tiers in table form;
                emuDB - an Emu compatible *_annot.json file;
                eaf - an ELAN compatible annotation file; exb - an EXMARaLDA compatible annotation file;
                tei - Iso TEI document (XML).
                For a description of BPF see
                http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html.
                for a description of Emu see https://github.com/IPS-LMU/emuR.
                Note 1: using 'emuDB' will first produce only single annotation file *_annot.json;
                in the WebMAUS interface (https://clarin.phonetik.uni-muenchen.de/BASWebServices) you can process
                more than one file and than download a zipped Emu database; in this case don't forget to change the default
                name of the emuDB 'MAUSOUTPUT' using the R function emuR::rename_emuDB().
                Note 2: if you need the same result in more than one format, select 'bpf' to produce a BPF file, and then
                convert this file with the service runAnnotConv ('AnnotConv') into the desired formats.
                Note 3: some format conversions are not loss-less; select 'bpf' to be sure that no information is lost.
                

rate: [0.0, 999999.0] 
Option sample rate of signal file: if the sample rate cannot be determined 
                automatically from SIGNAL and is not given in the input BPF either, you can provide the 
                sampling rate via this option. Usually you can leave it to the default value of 1.

AWORD: Option AWORD: the string used to mask word labels for anonymized terms.
                

ANNOTONLY: [true, false] 
Option ANNOTONLY: If set to true, only the input BPF is anonymized and produced;
                the SIGNAL will not be used (can be omitted). The ouput ZIP file then contains only the 
                anonymized annotation file.

BPF: Mandatory input BPF: BAS Partitur Format (BPF) file (*.par or *.bpf) that 
                contains at least the ORT tier and one of MAU,PHO,SAP or IPA, which shall be anonymized. See
                http://www.bas.uni-muenchen.de/forschung/Bas/BasFormatseng.html for detailed description of the
                BPF.

APHONE: Option APHONE: the string used to mask phonetic/phonologic labels for anonymized terms.
                If not set, the service will use the label 'nib' for masking encodings in SAMPA, and the label
                '(.)' for encodings in IPA. If set to another label, this label is used to mask in all encodings.
                

ATERMS: Mandatory Option ATERMS: file encoded in UTF-8 containing the terms that are to 
                be anonymized by the service. One term per line; terms may contain blanks, in which case only 
                consecutive occurances of the words within the term are anonymized.
                

Output: A XML response containing the elements "success", "downloadLink", "output" and "warning".
                "success" states if the processing was successful or not, "downloadLink" specifies the location where
                the output ZIP file can be found which contains the anonymized copy of the input signal and the 
                anonymized annotation file (the format of the annotation file depends on the option selected in 
                OUTFORMAT), "output" contains the output that is mostly useful during debugging errors and "warning" 
                lists warnings, if any occured during the processing. 
                
----------------------------------------------------------------
----------------------------------------------------------------