VoiceXML 2.1 Development Guide Home  |  Frameset Home

  M: Text-To-Speech Guide  |  TOC  |  Rhetorical TTS Guide: SSML  

Prophecy TTS Guide: SSML

Now that we've gotten our language availability out of the way, let's see what we can do with them.  Speech Synthesis Markup Language, or SSML, allows us to manipulate TTS output and read it back in a specific way.  This tutorial will focus specifically on using Prophecy TTS with the following elements:



Interested?  Below you will find a few examples of how SSML can make your text-to-speech more unique.


<say-as> element


Date

<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1" xmlns="http://www.w3.org/2001/vxml" xmlns:voxeo="http://community.voxeo.com/xmlns/vxml">
<var name="date1" expr="'20011215'"/>
<var name="date2" expr="'200212??'"/>
<var name="date3" expr="'2003????'"/>
<var name="date4" expr="'????12??'"/>
<var name="date5" expr="'????1215'"/>
<var name="date6" expr="'??????15'"/>
<var name="date7" expr="'2004??15'"/>
<form id="F1">
  <block>
    <prompt xml:lang="en-us">
      <say-as interpret-as="vxml:date"><value expr="date1"/></say-as><break/>
      <say-as interpret-as="vxml:date"><value expr="date2"/></say-as><break/>
      <say-as interpret-as="vxml:date"><value expr="date3"/></say-as><break/>
      <say-as interpret-as="vxml:date"><value expr="date4"/></say-as><break/>
      <say-as interpret-as="vxml:date"><value expr="date5"/></say-as><break/>
      <say-as interpret-as="vxml:date"><value expr="date6"/></say-as><break/>
      <say-as interpret-as="vxml:date"><value expr="date7"/></say-as><break/>

    </prompt>
  </block>
</form>
</vxml>



Digit

<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1" xmlns="http://www.w3.org/2001/vxml" xmlns:voxeo="http://community.voxeo.com/xmlns/vxml">
<var name="digit1" expr="'8546'"/>
<var name="digit2" expr="'0563'"/>
<var name="digit3" expr="'238.2'"/>
<var name="digit4" expr="'-49.12'"/>
<form id="F1">
  <block>
    <prompt xml:lang="en-us">
      <say-as interpret-as="vxml:digits"><value expr="digit1"/></say-as><break/>
      <say-as interpret-as="vxml:digits"><value expr="digit2"/></say-as><break/>
      <say-as interpret-as="vxml:digits"><value expr="digit3"/></say-as><break/>
      <say-as interpret-as="vxml:digits"><value expr="digit4"/></say-as><break/>

    </prompt>
  </block>
</form>
</vxml>



Currency

<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1" xmlns="http://www.w3.org/2001/vxml" xmlns:voxeo="http://community.voxeo.com/xmlns/vxml">
<!-- US Dollars -->
<var name="currency1" expr="'USD20.54'"/>
<!-- Canadian Dollars -->
<var name="currency2" expr="'CAD12.76'"/>
<!-- Japanese Yen -->
<var name="currency3" expr="'JPY17.20'"/>
<!-- British Pounds -->
<var name="currency4" expr="'GBP47.17'"/>
<!-- Euros -->
<var name="currency5" expr="'EUR52.20'"/>
<!-- Russian Rubles-->
<var name="currency6" expr="'RUB78.45'"/>
<form id="F1">
  <block>
    <prompt xml:lang="en-us">
      <say-as interpret-as="vxml:currency"><value expr="currency1"/></say-as><break/>
      <say-as interpret-as="vxml:currency"><value expr="currency2"/></say-as><break/>
      <say-as interpret-as="vxml:currency"><value expr="currency3"/></say-as><break/>
      <say-as interpret-as="vxml:currency"><value expr="currency4"/></say-as><break/>
      <say-as interpret-as="vxml:currency"><value expr="currency5"/></say-as><break/>
      <say-as interpret-as="vxml:currency"><value expr="currency6"/></say-as><break/>

    </prompt>
  </block>
</form>
</vxml>


Number

<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1" xmlns="http://www.w3.org/2001/vxml" xmlns:voxeo="http://community.voxeo.com/xmlns/vxml">
<var name="number1" expr="0"/>
<var name="number2" expr="243"/>
<var name="number3" expr="8721"/>
<var name="number4" expr="8123.2"/>
<var name="number5" expr="-321.12"/>
<form id="F1">
  <block>
    <prompt xml:lang="en-us">
      <say-as interpret-as="vxml:number"><value expr="number1"/></say-as><break/>
      <say-as interpret-as="vxml:number"><value expr="number2"/></say-as><break/>
      <say-as interpret-as="vxml:number"><value expr="number3"/></say-as><break/>
      <say-as interpret-as="vxml:number"><value expr="number4"/></say-as><break/>
      <say-as interpret-as="vxml:number"><value expr="number5"/></say-as><break/>

    </prompt>
  </block>
</form>
</vxml>



Phone

<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1" xmlns="http://www.w3.org/2001/vxml" xmlns:voxeo="http://community.voxeo.com/xmlns/vxml">
<!-- NOTE: The use a plus "+" is not permitted in the telephone number. -->
<var name="tele1" expr="'8004441234'"/>
<var name="tele2" expr="'3003334321x824'"/>
<form id="F1">
  <block>
    <prompt xml:lang="en-us">
      <say-as interpret-as="vxml:phone"><value expr="tele1"/></say-as><break/>
      <say-as interpret-as="vxml:phone"><value expr="tele2"/></say-as><break/>

    </prompt>
  </block>
</form>
</vxml>



Time

<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1" xmlns="http://www.w3.org/2001/vxml" xmlns:voxeo="http://community.voxeo.com/xmlns/vxml">
<var name="time1" expr="'0113a'"/>
<var name="time2" expr="'1013p'"/>
<var name="time3" expr="'1157h'"/>
<form id="F1">
  <block>
    <prompt xml:lang="en-us">
      <say-as interpret-as="vxml:time"><value expr="time1"/></say-as><break/>
      <say-as interpret-as="vxml:time"><value expr="time2"/></say-as><break/>
      <say-as interpret-as="vxml:time"><value expr="time3"/></say-as><break/>

    </prompt>
  </block>
</form>
</vxml>



Boolean

<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1" xmlns="http://www.w3.org/2001/vxml" xmlns:voxeo="http://community.voxeo.com/xmlns/vxml">
<form id="F1">
  <field name="F_1" type="boolean">
    <prompt xml:lang="en-us">
      Do you like Voice X M L?
    </prompt>
    <filled>
      <prompt>
        You said <say-as interpret-as="vxml:boolean">
      <value expr="F_1"/></say-as>

      </prompt>
    </filled>
  </field>
</form>
</vxml>


By now, you should be a say-as expert.  Still craving SSML knowledge?  Fear not, for we have more SSML goodies for you below.

<emphasis> element


<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">
<form id="F_1">
  <block>
    <prompt xml:lang="en-us">
      That is a <emphasis> big </emphasis> cat!
      That is a <emphasis level="strong"> huge </emphasis> dog!

    </prompt>
  </block>
</form>
</vxml>



<sub> element


<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">
<form id="F_1">
  <block>
    <prompt xml:lang="en-us">
      this message brought to you by the
        <sub alias="World Wide Web Consortium">W3C</sub>
    </prompt>
  </block>
</form>
</vxml>



<prosody> element


<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">
<form id="F_1">
  <block>
    <prompt xml:lang="en-us">
    <!-- Note: The "duration" attribute currently does not work with Prophecy TTS -->
      <prosody duration="8000ms">
        Testing with long duration
      </prosody>

      <prosody duration="50ms">
        Testing with short duration
      </prosody>

      <prosody rate="slow">
        Testing with slow rate
      </prosody>

      <prosody rate="fast">
        Testing with fast rate
      </prosody>

      <prosody volume="soft">
        Testing with low volume
      </prosody>

      <prosody volume="loud">
        Testing with high volume
      </prosody>

      <prosody pitch="low">
        Testing with low pitch
      </prosody>

      <prosody pitch="high">
        Testing with high pitch
      </prosody>

    </prompt>
  </block>
</form>
</vxml>



<break> element


<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">
<form id="F_1">
  <block>
    <prompt xml:lang="en-us">
      We will now test a break in our T T S output
      starting right here <break strength="weak"/> and then we
      will have another right <break time="3000"/> here.
    </prompt>
  </block>
</form>
</vxml>



<phoneme> element


<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">
<form id="F_1">
  <block>
    <prompt xml:lang="en-us" bargein="false">
      Playing back tomato phoneme example.
    <break/>
      <!-- This will output tomato instead of apple -->
      <phoneme alphabet="x-cmu" ph="T AH0 M EY1 T OW0"> apple </phoneme>
    <break/>
    </prompt>
  </block>
</form>
</vxml>



<sentence> element


<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">
<form id="F_1">
  <block>
    <prompt>
      <sentence xml:lang="en-us">
        Brave Sir Robin ran away.
      </sentence>
      <sentence xml:lang="en-us">
        When danger reared its ugly head, he bravely turned his tail and fled.
      </sentence>
      <sentence xml:lang="en-us">
        Yes, brave Sir Robin turned about, and valiantly, he chickened out.
      </sentence>

    </prompt>
  </block>
</form>
</vxml>



<paragraph> element


<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">
<form id="F_1">
  <block>
    <prompt>
      <paragraph xml:lang="en-us">
        The 12-gauge double-barreled Remington. S-Mart's top of the line. You can find this in the sporting goods department. That's right, this sweet baby was made in Grand Rapids, Michigan.
      </paragraph>

    </prompt>
  </block>
</form>
</vxml>



Download the Code!

  Prophecy TTS code


  ANNOTATIONS: EXISTING POSTS
neilhollingum
12/22/2008 4:21 AM (EST)


--- RESOLVED ----



There seems to be a bug here in the currency example.

The code says:-

<!-- Euros -->
<var name="currency5" expr="'EUR52.20'"/>
<!-- Russian Rubles-->
<var name="currency5" expr="'RUB78.45'"/>
<form id="F1">

Notice that currency5 is defined twice but there is no definition for currency6 which is used later. The code should read:-

<!-- Euros -->
<var name="currency5" expr="'EUR52.20'"/>
<!-- Russian Rubles-->
<var name="currency6" expr="'RUB78.45'"/>
<form id="F1">


Regards

Neil
premisenational
2/14/2009 12:45 PM (EST)
Hello:

We are working on a project with a client who has a server that will provide text responses to queries during the call we are to incorporate as TTS to be played to the caller. (something like "your thing is approved, your code is XXX.")

Are there any special symbols that they should be aware of that are optional to create pauses etc. to improve the read-back?  We will be receiving just a block of text from them, and I want to keep it as simple as possible for them to structure their response.

For example, the <p> and <s> paragrah and sentence tags, or # or other special characters indicating emphasis, pauses etc.?

Or do we just rely on commas, periods and ! and other markers normally within text?

THANKS
voxeojeremyr
2/14/2009 1:42 PM (EST)
Hello,

You could use the <prosody> element to slow speech down or the <break> tag to put in hard stopping points but I am not sure if that goes against the requirement of being simple to add to the string of text you are wanting to be played.

Normal punctuation marks are read and understood by the TTS engine.  So for a pause a comma, semicolon or elipses (...) would work as well.

Regards,
Jeremy Richmond
Voxeo Support
ssa_peregrin
11/13/2009 10:10 AM (EST)
Greetings,
I was curious as to whether or not there is a Prophecy equivalent to Rhetorical's <say-as interpret-as="vxml:address"> ?  Returning state abbreviations as full state names is easy enough, but I'm more concerned about things such as St, Rd, Cir, Bld, Blvd and other common address abbreviations.

Are there any other options to using a specific say-as for an address?

Thanks,
Russ
VoxeoDustin
11/13/2009 10:27 AM (EST)
Hello Russ,

The type address is a legacy SSML tag supported by old Rhetorical/Speechify voices.

Per the VXML spec, the supported SSML interpret-as types are:

vxml:phone
vxml:digits
vxml:boolean
vxml:time
vxml:date
vxml:number
vxml:currency

http://www.w3.org/TR/voicexml20/#dmlABuiltins

All of the above should work with Prophecy TTS without issue.

Let me know if we can be of further assistance.

Regards,
Dustin Hayre
Solutions Engineer
Voxeo Corporation
ssa_peregrin
11/13/2009 10:55 AM (EST)
Thanks Dustin,
Would my only option using a Prophecy TTS for addresses be to parse an address for specific abbreviations and replace them as they're found?  Rd to Road, Blvd to boulevard, etc?
jdyer
11/13/2009 11:05 AM (EST)
Hello Russ,

  Yes, in that would be the best option in our opinion; You could implement ECMA script or handle this server side depending on your needs. If there are any other questions on this, or anything else please let us know as our team is most certainly standing by to be of service!

Regards,

John Dyer
Customer Engineer
Voxeo Support
dhubler
12/22/2009 2:31 PM (EST)
What about the w3c spec that states these type should *not* be prefixed with "vxml:"

  http://www.w3.org/TR/ssml-sayas/

Excerpt:

  The following interpret-as values are defined to be legal values. 
  These values must not be prefixed.
  interpret-as value Defined in section
  date 3.1
  time 3.2
  telephone 3.3
  characters 3.4
  cardinal 3.5
  ordinal 3.6

It almost seems like W3C is conflicting with themselves saying "date" must not be prefixed *and* saying "vxml:date" is a standard in http://www.w3.org/TR/voicexml20/#dmlABuiltins.

This may seem like a minor difference but in order to use multiple TTS vendors at the same time, which one is right?
VoxeoDante
12/22/2009 2:49 PM (EST)
Hello,

Im my opinion the vxml: portion is more of a namespace declaration that a prefix, but that portion of the spec is not clearly defined.  I do have to say that I have seen the W3C conflict with themselves on a number of occasions.

All that being said.  For a proper VXML implementation all of these options, both the VXML: headed and non versions should be supported.

If you find that they are not, please let us know and we will get it fixed.

Regards,
Dante Vitulano
Customer Support Engineer II
Voxeo Corporation
SSAUK_M-PLIFY
8/20/2010 12:57 PM (EDT)
Regarding the prosody element.

Is there something between the default and the 'slow' rate?
The default is slightly to fast, and the 'slow' slightly to slow...

Or is there any other possibility configure the speed?

Regards

Izidor
voxeojeremyr
8/20/2010 10:01 PM (EDT)
Hello Izidor,

You can use the "duration" value if you are playing back a consistent value, such as a 10 digit phone number or such.

For a production system, we always recommend to use pre-recorded audio when you can as even the best quality TTS is not as good as pre recorded audio.

If you have any questions, please do not hesitate to let us know.

Regards,
Jeremy Richmond
Voxeo Support
vishaljha
3/7/2011 12:18 AM (EST)
How I can spell word by word or character by character?

For example field contains "Apple Banana".

So according to my wish how I can spell it out as

Apple and then Banana( this is default).

Now I want to spell it out as A,P,P,L,E, B,A,N,A,N,A
voxeoJeffK
3/7/2011 9:54 AM (EST)
Hello,

There isn't a <say-as> exactly for that, but some Javascript can do the job. This puts a comma between each letter in the contents of myVariable:

    <filled>
      <var name="letters"/>
      <assign name="letters" expr="myVariable.split('').join(',');"/>

      <prompt>Your input was
      <value expr="letters"/>.
      </prompt>
    </filled>

Regards,
Jeff Kustermann
Voxeo Support
SSAUK_M-PLIFY
5/24/2011 11:13 AM (EDT)
The PHONEME element (http://www.w3.org/TR/speech-synthesis/#S3.1.9) specifies the following possibilities for the phonemic/phonetic alphabet to use according to the value of the (optional) "alphabet" attribute:

1) "ipa", the default

Legal ph values are string composed of the Unicode representations of the phonetic characters developed by the International Phonetic Association (for those characters for which such a mapping exists).

See:

http://web.uvic.ca/ling/resources/ipa/charts/unicode_ipa-chart.htm
http://www.phon.ucl.ac.uk/home/wells/ipa-unicode.htm

2) vendor-defined strings of the form "x-organization" or "x-organization-alphabet"

Prophecy seems to support the "Carnegie Mellon University pronunciation notation", as indicated by "x-cmu".

See:

http://www.speech.cs.cmu.edu/cgi-bin/cmudict



Question:

- Is IPA indeed supported?
- What is recommended for use with Prophecy, IPA or CMU?


VoxeoDustin
5/24/2011 6:43 PM (EDT)
Hello,

P8 and above should support CMU and SAMPA lexicons:

<block>
    <prompt>
      <phoneme alphabet="x-CMU" ph="HH Y UW1 JH"> hello </phoneme>
    </prompt>
</block>

Playing back phoneme example.
<break/>
<phoneme alphabet="x-sampa" ph="t@'meit@U"> whatever </phoneme>
<break/>

Note that support and pronunciation is entirely dependent on the TTS engine being used, so you may notice different behavior if you switch to Nuance or Loquendo, for instance.

Regards,
Dustin Hayre
Solutions Engineer
Voxeo Corporation

[url=http://voxeo.com/prophecy]Download Prophecy 10[/url] | [url=http://docs.voxeo.com/prophecy/10.0/home.htm]Prophecy 10 Docs[/url]

login
  M: Text-To-Speech Guide  |  TOC  |  Rhetorical TTS Guide: SSML  

© 2013 Voxeo Corporation  |  Voxeo IVR  |  VoiceXML & CCXML IVR Developer Site