SAPI tags supported in S.410 TTS Player

SAPI Syntax

Text-to-speech escape sequences follow these general rules of syntax:

All tags begin and end with a backslash character (\).
The backslash character is not allowed within a tag.
An odd number of backslash characters in tagged text produce undefined behavior in the player/recorder resource.
Escape sequences are case-insensitive. For example, \vce\ is the same as \VCE\.
Escape sequences are white-space – dependent. For example, \Rst\ is not the same as \ Rst \.

To include a backslash character in tagged text, but outside a tag, use a double backslash (\\).

Escape sequences that set TTS rendering attributes are expected to be persistent unless the resource rendering TTS data is deconfigured from a group. For example, if the \ ctx="e-mail"\ tag is passed to a TTS Player resource that supports that escape sequence, the engine stays in the "e-mail" context until another sequence changes the context.

Escape sequences unknown to a resource attempting to decode SAPI-encoded data are expected to be ignored.

If the syntax of an escape sequence is in error (for example, missing a "/" character), the enclosed text is ignored. Although it is expected that a coder implementation should take reasonable measures to recover from text errors, it is the responsibility of the creator of the MDO to insure its contents is encoded properly. To limit the effects of erroneous syntax it may be useful to include some maximum range over which an escape sequence is valid, say up to the end of a sentence or paragraph.

SAPI Escape Sequences for Minimal S.410 Conformance

This section uses the following typographic conventions. The following codes are required for minimal compliance with S.100:

Example	Description
Chr	Bold type indicates speech-inflection keywords.
string	Italic indicates placeholders for information you supply, such as a character or context string.
[option]	Square brackets indicate items that are optional.
[option...]	Three dots (an ellipsis) following an item indicate that more items having the same form may appear.
"C"	Quotation marks are required to delimit strings.

Required SAPI tags

Sequence Description

\COM=string\
string Text of the comment.
Example:
\COM="This is a comment."\
Embeds a comment in the text. Comments are not translated into speech.

\DLM="<symbol>"\
<symbol> A text character.
Example:
\DLM="<Esc>"\
Replaces the escape sequence character with "<symbol>" All subsequent occurrences of <symbol> are treated as the escape secqence character.

\EMP\
Example:
\"... the \EMP\truth, the \EMP\whole truth, and nothing \EMP\but the truth."
Emphasizes the next word to be spoken.

\Eng:<VendorID>:<command>\
\Eng:<command>\
VendorID Naming scope identifier
command Vendor-specific command
Examples:\Eng:DLGC:beep\ \Eng:SUNW:DTMF=9,555-1212\ \Eng:CGRM:playwav "\\windows\\sounds\\tada.wav"\
Embeds a vendor- or engine-specific command that affects only a specific TTS Player resource implementation. Subsequent \Eng commands for a specified engine can omit the <VendorID>:, until an engine-specific escape sequence for another engine is specified.
The engine-specific commands may include any optional parameters as demonstrated in the examples.

\PAU=number\
number Number of milliseconds to pause.
Example:
\PAU=1000\
Pauses speech for the specified number of milliseconds.

\PIT=number\
number Pitch, in hertz. The actual pitch fluctuates above and below this baseline.
Example:
\PIT=120\
Sets the baseline pitch of the text-to-speech mode to the specified value in hertz.

\PRN=pronounciation[=part_of_speech]\
pronounciation
part_of_speech A string from the list defined for the "PRT" escape sequence.
Example:
\PRN=tomato=tomaato\ \PRN=resume=rezumay=N\
Indicates how to pronounce text by passing the phonetic equivalent to the Player resource.

\RST\
Example:
\RST\
Resets all escape sequences to the Player resource's default settings.

\SPD=number\
number Baseline average talking speed, in words per minute.
Example:
\SPD=90\
Sets the baseline average talking speed of the text-to-speech coder to the specified number of words per minute.

\VOL=number\
number Baseline speaking volume. The volume level is a linear range from 0 for absolute silence to 65535 for maximum volume. The default is 65535.
Example:
\VOL=32768\
Sets the baseline speaking volume for the text-to-speech coder.

Optional SAPI Escape Sequences

Many TTS engines support SAPI tags that, while not within the scope of minimal compliance for S.410, demonstrate features that distinguish them in the marketplace. An application developer may select a Player resource via the attribute a_SapiOptions, using symbols corresponding to these tags. The tags and their corresponding symbols are given in the table below.

Tag	Symbol
`Chr`	`ESymbol.Container_SAPIChr`
`Ctx`	`ESymbol.Container_SAPICtx`
`Mrk`	`ESymbol.Container_SAPIMrk`
`Pro`	`ESymbol.Container_SAPIPro`
`Prt`	`ESymbol.Container_SAPIPrt`
`Vce`	`ESymbol.Container_SAPIVce`

The meaning of the tags are described in the following tables.

\CHR=string[[,string]...]\
string String that specifies the characteristics of the voice.
Example:
\CHR="Angry","Loud"\
Sets the character of the voice. Although less specific than setting the inflection, stress, attack, and whispering qualities individually, it is easier to use and allows the engine more flexibility and intelligence in its response.
Some commonly available characteristics are: Normal, Angry, Business, Calm, Depressed, Excited, Falsetto, Happy, Loud, Monotone, Perky, Quiet, Sarcastic, Scared, Shout, Tense, Whisper

\CTX=string\
string String that specifies the context.
Example:
\CTX="Address"\
Sets the context for the text that follows, which determines how symbols are spoken. The string can be one of the following: Address, C, Document, E-Mail, Numbers, Spreadsheet, Unknown, Normal.

\MRK=number\
number Number of the bookmark.
Example:
\MRK=75000\
Indicates a bookmark in the text.
When the TTS Player resource encounters this escape sequence, it notifies the application by generating the Player.ev_Marker event. The number is in the event as Player.ev_Mark
Note: Bookmark number zero (\Mrk=0\) is reserved; a Player.ev_Marker event is not sent for bookmark number zero.

\PRO=number\
number Setting number to 1 activates prosodic rules (the default). Setting number to 0 deactivates prosodic rules
Example:
\PRO=0\
Activates or deactivates prosodic rules, which affect pitch, speaking rate, and volume of words independently of control tags embedded in the text.

\PRT=string\
string Indicates the part of speech.
Example:
\PRT="Abbr"\
Indicates the part of speech of the next word.
string can be one of: "Abbr" (Abbreviation), "Adj" (Adjective), "Adv" (Adverb), "Card" (Cardinal number), "Conj" (Conjunction), "Cont" (Contraction), "Det" (Determiner), "Interj" (Interjection), "N" (Noun), "Ord" (Ordinal number), "Prep" (Preposition), "Pron" (Pronoun), "Prop" (Proper noun), "Punct" (Punctuation), "Quant" (Quantifier), "V" (Verb).

\VCE=charact=value[[,charact=value]...]\
charact One of the defined characteristics: Accent, Dialect, Gender, Speaker, Age, Style.
value String that specifies value or type for the given characteristic.
Example:
\VCE=Gender="Female","Age=Adolescent"\
Instructs the engine to change its speaking voice to one that has the specified characteristics.
Accent: speak the given language with this accent.
For example: Language="English", Accent="French".
Dialect: speak in the given dialect.
Gender: value is "Male", "Female", "Neutral"
Speaker: name of the voice or "NULL".
Age: one of "Baby" (about 1 yr), "Toddler" (about 3 yrs), "Child" (about 5 yrs), "Adolescent" (about 14 yrs), "Adult" (between 20 and 60 yrs) or "Elderly" (over 60 yrs).

Sequence	Description
`\COM=string\` string Text of the comment. Example: `\COM="This is a comment."\`	Embeds a comment in the text. Comments are not translated into speech.
`\DLM="<symbol>"\` <symbol> A text character. Example: `\DLM="<Esc>"\`	Replaces the escape sequence character with "<symbol>" All subsequent occurrences of <symbol> are treated as the escape secqence character.
`\EMP\` Example: `\"... the \EMP\truth, the \EMP\whole truth, and nothing \EMP\but the truth."`	Emphasizes the next word to be spoken.
`\Eng:<VendorID>:<command>\` `\Eng:<command>\` VendorID Naming scope identifier command Vendor-specific command Examples:`\Eng:DLGC:beep\ \Eng:SUNW:DTMF=9,555-1212\ \Eng:CGRM:playwav "\\windows\\sounds\\tada.wav"\`	Embeds a vendor- or engine-specific command that affects only a specific TTS Player resource implementation. Subsequent `\Eng` commands for a specified engine can omit the `<VendorID>:`, until an engine-specific escape sequence for another engine is specified. The engine-specific commands may include any optional parameters as demonstrated in the examples.
`\PAU=number\` number Number of milliseconds to pause. Example: `\PAU=1000\`	Pauses speech for the specified number of milliseconds.
`\PIT=number\` number Pitch, in hertz. The actual pitch fluctuates above and below this baseline. Example: `\PIT=120\`	Sets the baseline pitch of the text-to-speech mode to the specified value in hertz.
`\PRN=pronounciation[=part_of_speech]\` pronounciation `part_of_speech` A string from the list defined for the "PRT" escape sequence. Example: `\PRN=tomato=tomaato\ \PRN=resume=rezumay=N\`	Indicates how to pronounce text by passing the phonetic equivalent to the Player resource.
`\RST\` Example: `\RST\`	Resets all escape sequences to the Player resource's default settings.
`\SPD=number\` number Baseline average talking speed, in words per minute. Example: `\SPD=90\`	Sets the baseline average talking speed of the text-to-speech coder to the specified number of words per minute.
`\VOL=number\` number Baseline speaking volume. The volume level is a linear range from 0 for absolute silence to 65535 for maximum volume. The default is 65535. Example: `\VOL=32768\`	Sets the baseline speaking volume for the text-to-speech coder.

`\CHR=string[[,string]...]\` string String that specifies the characteristics of the voice. Example: `\CHR="Angry","Loud"\`	Sets the character of the voice. Although less specific than setting the inflection, stress, attack, and whispering qualities individually, it is easier to use and allows the engine more flexibility and intelligence in its response. Some commonly available characteristics are: Normal, Angry, Business, Calm, Depressed, Excited, Falsetto, Happy, Loud, Monotone, Perky, Quiet, Sarcastic, Scared, Shout, Tense, Whisper
`\CTX=string\` string String that specifies the context. Example: `\CTX="Address"\`	Sets the context for the text that follows, which determines how symbols are spoken. The string can be one of the following: Address, C, Document, E-Mail, Numbers, Spreadsheet, Unknown, Normal.
`\MRK=number\` number Number of the bookmark. Example: `\MRK=75000\`	Indicates a bookmark in the text. When the TTS Player resource encounters this escape sequence, it notifies the application by generating the `Player.ev_Marker` event. The number is in the event as `Player.ev_Mark` Note: Bookmark number zero (\Mrk=0\) is reserved; a `Player.ev_Marker` event is not sent for bookmark number zero.
`\PRO=number\` number Setting number to 1 activates prosodic rules (the default). Setting number to 0 deactivates prosodic rules Example: `\PRO=0\`	Activates or deactivates prosodic rules, which affect pitch, speaking rate, and volume of words independently of control tags embedded in the text.
`\PRT=string\` string Indicates the part of speech. Example: `\PRT="Abbr"\`	Indicates the part of speech of the next word. string can be one of: "Abbr" (Abbreviation), "Adj" (Adjective), "Adv" (Adverb), "Card" (Cardinal number), "Conj" (Conjunction), "Cont" (Contraction), "Det" (Determiner), "Interj" (Interjection), "N" (Noun), "Ord" (Ordinal number), "Prep" (Preposition), "Pron" (Pronoun), "Prop" (Proper noun), "Punct" (Punctuation), "Quant" (Quantifier), "V" (Verb).
`\VCE=charact=value[[,charact=value]...]\` charact One of the defined characteristics: Accent, Dialect, Gender, Speaker, Age, Style. value String that specifies value or type for the given characteristic. Example: `\VCE=Gender="Female","Age=Adolescent"\`	Instructs the engine to change its speaking voice to one that has the specified characteristics. Accent: speak the given language with this accent. For example: Language="English", Accent="French". Dialect: speak in the given dialect. Gender: value is "Male", "Female", "Neutral" Speaker: name of the voice or "NULL". Age: one of "Baby" (about 1 yr), "Toddler" (about 3 yrs), "Child" (about 5 yrs), "Adolescent" (about 14 yrs), "Adult" (between 20 and 60 yrs) or "Elderly" (over 60 yrs).