VoiceXML 2.1 Development Guide Home  |  Frameset Home

  tutorial Hello World  |  TOC  |  tutorial Call Flow   

Tutorial: VoiceXML hello world with voice recognition

This tutorial is based on the things you accomplished in Tutorial 1. If you have not completed that tutorial, you'll need to go through it first.

Step 1: creating our initial VoiceXML structure


From our previous tutorial, we now recognize the following structure as a normal starting point:

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">
</vxml>


Step 2: our first look at making a voice recognition menu


As you might expect, utilizing voice recognition is not a trivial process; nevertheless, the establishment of a basic set of responses to prompts can be done fairly painlessly. VoiceXML requires a "grammar" file, which we will get to in a moment, but first we need an actual menu/form so that the caller knows what to say.

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" >

  <form id="MainMenu">
    <field name="SouthParkCharacter">
      Please say your favorite South Park character's name.
    </field>
  </form>



Now we wax dramatic and add the grammar file. The caller has been prompted, and we need to have the mechanism to capture their input. So we still need to tell our "menu" to look at a grammar file. Let's look at a common inline GSL grammar structure:

<grammar type="text/gsl">
  [kenny cartman stan kyle [terrance phillip] canadians chef (mister hat) (big gay al) wendy 
  timmy  hanky garrison (cartmans mom) pip ike (mister mackey)  mephisto jimbo marvin] 
</grammar>


Our grammar file can be much shorter or much longer than this, but we see above more than enough names to learn several important aspects concerning VoiceXML. First, and possibly most important, is that we declare the grammar 'type' attribute. The Motorola browser requires this attribute in order to properly use this grammar within your VoiceXML application; failure to specify this value will result in an error. Therefore, it is a good idea to make a habit out of always declaring the <grammar type> in your VoiceXML documents. Next, notice that the names inside the brackets are what VoiceXML is attempting to match against what the caller speaks over the phone. Notice that these are all lowercase: this is not by accident. These must be lowercase, otherwise VoiceXML will interpret them as references to other grammar blocks, (or, more precisely grammar Rulenames). What does that mean? It means that "[kenny]" attempts to match speech to the word "kenny" and "[KENNY]" searches for another grammar structure entirely that is called "KENNY".

Our next point involves several of our entries that have multiple words (such as "[terrance phillip]" and "(mister hat)". Notice that some use parentheses, while others use only brackets. There is a subtle but important distinction between the two nomenclatures: if you use brackets, but multiple words, then any of the words will match; if you use parentheses, then all the words must match, in the order shown, or there is no valid match. Thus, "[terrance phillip]" matches if VoiceXML hears "terrance" or "phillip"; however, if VoiceXML's grammar entry is "(mister hat)", then it has to hear "mister hat" -- "mister" or "hat" are not enough by themselves to match.

Whew, that was alot of description. But you can begin to see how powerful careful matching can be. Moreover, the syntax is extremely straightforward.


Step 4: incorporating grammar values into our application

Now that we have a menu and a grammar file that is called by that menu, what do we do now? Since we went to all the trouble of writing out a grammar file that returned values, why don't we do something with those values:

    <filled namelist="SouthParkCharacter">
      <if cond="SouthParkCharacter == 'kenny'">
      <elseif cond="SouthParkCharacter == 'cartman'"/>
      <elseif cond="SouthParkCharacter == 'stan'"/>
      <elseif cond="SouthParkCharacter == 'kyle'"/>
      <elseif cond="SouthParkCharacter == 'canadians'"/>
      <elseif cond="SouthParkCharacter == 'chef'"/>
      <elseif cond="SouthParkCharacter == 'mister hat'"/>
      <elseif cond="SouthParkCharacter == 'big gay al'"/>
      <elseif cond="SouthParkCharacter == 'wendy'"/>
      <elseif cond="SouthParkCharacter == 'timmy'"/>
      <elseif cond="SouthParkCharacter == 'hanky'"/>
      <elseif cond="SouthParkCharacter == 'garrison'"/>
      <elseif cond="SouthParkCharacter == 'cartmans mom'"/>
      <elseif cond="SouthParkCharacter == 'pip'"/>
      <elseif cond="SouthParkCharacter == 'ike'"/>
      <elseif cond="SouthParkCharacter == 'mister mackey'"/>
      <elseif cond="SouthParkCharacter == 'mephisto'"/>
      <elseif cond="SouthParkCharacter == 'jimbo'"/>
      <elseif cond="SouthParkCharacter == 'marvin'"/>
      <else/>
      </if>
    </filled>


For anyone who has coded before, this section should look familiar enough. The <filled> tag means that our <field> has been filled with a recognized value (retrieved from our grammar file). Then we have a long series of <if> and <elseif> statements as we determine which value actually came back from our grammar file. Notice that we use a double equal sign (like PHP or Perl) to determine if our <field> variable (called "SouthParkCharacter") is equal to a specific value. This value must be scripted in single quotes (i.e., ' '). That is all there is to it. Of course, we still are not responding to the caller, but that is simple enough to rectify. Let's associate some text-to-speech with each of our possible conditions/matches:


    <filled namelist="SouthParkCharacter">
      <if cond="SouthParkCharacter == 'kenny'">
        <prompt>Kenny has more lives than a cat.</prompt>
      <elseif cond="SouthParkCharacter == 'cartman'"/>
        <prompt>Cartman is not fat.  He is big boned.</prompt>
      <elseif cond="SouthParkCharacter == 'stan'"/>
        <prompt>Stan likes Wendy.</prompt>
      <elseif cond="SouthParkCharacter == 'kyle'"/>
        <prompt>Kyle has a gay dog.</prompt>
      <elseif cond="SouthParkCharacter == 'canadians'"/>
        <prompt>Canada.  What is that aboot?</prompt>
      <elseif cond="SouthParkCharacter == 'chef'"/>
        <prompt>Chef is the coolest man in South Park.</prompt>
      <elseif cond="SouthParkCharacter == 'misterhat'"/>
        <prompt>Mister Hat is a puppet.</prompt>
      <elseif cond="SouthParkCharacter == 'biggayal'"/>
        <prompt>Big Gay Al is gay.</prompt>
      <elseif cond="SouthParkCharacter == 'wendy'"/>
        <prompt>Wendy likes Stan.</prompt>
      <elseif cond="SouthParkCharacter == 'timmy'"/>
        <prompt>Timmmy!  Timmmy tim maugh!</prompt>
      <elseif cond="SouthParkCharacter == 'hanky'"/>
        <prompt>Mister Hanky, the Christmas poo.</prompt>
      <elseif cond="SouthParkCharacter == 'garrison'"/>
        <prompt>Mister Garrison is gay.</prompt>
      <elseif cond="SouthParkCharacter == 'cartmansmom'"/>
        <prompt>Cartman's mom loves the Denver Broncos.</prompt>
      <elseif cond="SouthParkCharacter == 'pip'"/>
        <prompt>Pip is British.</prompt>
      <elseif cond="SouthParkCharacter == 'ike'"/>
      <prompt>Ike is also Canadian.</prompt>
      <elseif cond="SouthParkCharacter == 'mistermackey'"/>
        <prompt>Mister Mackey.  Mmmmmmkay.</prompt>
      <elseif cond="SouthParkCharacter == 'mephisto'"/>
        <prompt>Mephisto enjoys experimenting on animals.</prompt>
      <elseif cond="SouthParkCharacter == 'jimbo'"/>
        <prompt>Jimbo is a redneck.</prompt>
      <elseif cond="SouthParkCharacter == 'marvin'"/>
        <prompt>Marvin is really hungry.</prompt>
      <else/>
      <prompt>
        A match has occurred, but no specific if statement
        was written for it.  Probably just a minor character
        like Tweak or Jimbo's gun-toting friend.
      </prompt>
      </if>
    </filled>


It looks like a lot of text, but really we just added one line of text-to-speech code for each match. Again, you don't need any special tag to for TTS.

Step 5: putting it all together

Just a couple more lines of code, we swear.

Since voice recognition is not always 100% accurate for matching, we want to add several handlers to our <field> tag just in case:

      <noinput>
        <prompt>I did not hear anything.  Please try again.</prompt>
        <reprompt/>
      </noinput>
 
      <nomatch>
        <prompt>I did not recognize that character.  Please try again.</prompt>
        <reprompt/>
      </nomatch>


The <noinput> tag is fairly self-explanatory: it is triggered if the caller does not say anything. Thus, we now have a nice little extra prompt that informs them to say something. <reprompt> simply repeats the <field> tag from the start. In a similar fashion as <noinput>, <nomatch> is triggered when VoiceXML hears something, but cannot successfully match it to a value in the grammar file.

Our script is now complete, and the entire file should look like this:

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">

  <form id="MainMenu">
    <field name="SouthParkCharacter">
      Please say your favorite South Park character's name.
<grammar type="text/gsl">
        [kenny cartman stan kyle [terrance phillip] canadians chef
          (mister hat) (big gay al) wendy  timmy
          hanky garrison (cartmans mom) pip ike (mister mackey)
          mephisto jimbo marvin] 
      </grammar>
 
      <noinput>
        I did not hear anything.  Please try again.
        <reprompt/>
      </noinput>
 
      <nomatch>
        I did not recognize that character.  Please try again.
        <reprompt/>
      </nomatch>
    </field>

    <filled namelist="SouthParkCharacter">
      <if cond="SouthParkCharacter == 'kenny'">
        <prompt>Kenny has more lives than a cat.</prompt>
      <elseif cond="SouthParkCharacter == 'cartman'"/>
        <prompt>Cartman is not fat.  He is big boned.</prompt>
      <elseif cond="SouthParkCharacter == 'stan'"/>
        <prompt>Stan likes Wendy.</prompt>
      <elseif cond="SouthParkCharacter == 'kyle'"/>
        <prompt>Kyle has a gay dog.</prompt>
      <elseif cond="SouthParkCharacter == 'canadians'"/>
        <prompt>Canada.  What is that aboot?</prompt>
      <elseif cond="SouthParkCharacter == 'chef'"/>
        <prompt>Chef is the coolest man in South Park.</prompt>
      <elseif cond="SouthParkCharacter == 'misterhat'"/>
        <prompt>Mister Hat is a puppet.</prompt>
      <elseif cond="SouthParkCharacter == 'biggayal'"/>
        <prompt>Big Gay Al is gay.</prompt>
      <elseif cond="SouthParkCharacter == 'wendy'"/>
        <prompt>Wendy likes Stan.</prompt>
      <elseif cond="SouthParkCharacter == 'timmy'"/>
        <prompt>Timmmy!  Timmmy tim maugh!</prompt>
      <elseif cond="SouthParkCharacter == 'hanky'"/>
        <prompt>Mister Hanky, the Christmas poo.</prompt>
      <elseif cond="SouthParkCharacter == 'garrison'"/>
        <prompt>Mister Garrison is gay.</prompt>
      <elseif cond="SouthParkCharacter == 'cartmansmom'"/>
        <prompt>Cartman's mom loves the Denver Broncos.</prompt>
      <elseif cond="SouthParkCharacter == 'pip'"/>
        <prompt>Pip is British.</prompt>
      <elseif cond="SouthParkCharacter == 'ike'"/>
      <prompt>Ike is also Canadian.</prompt>
      <elseif cond="SouthParkCharacter == 'mistermackey'"/>
        <prompt>Mister Mackey.  Mmmmmmkay.</prompt>
      <elseif cond="SouthParkCharacter == 'mephisto'"/>
        <prompt>Mephisto enjoys experimenting on animals.</prompt>
      <elseif cond="SouthParkCharacter == 'jimbo'"/>
        <prompt>Jimbo is a redneck.</prompt>
      <elseif cond="SouthParkCharacter == 'marvin'"/>
        <prompt>Marvin is really hungry.</prompt>
      <else/>
      <prompt>
        A match has occurred, but no specific if statement
        was written for it.  Probably just a minor character
        like Tweak or Jimbo's gun-toting friend.
      </prompt>
      </if>
    </filled>
  </form>
</vxml>



Step 6: upload, and try it out

All that remains now is to upload your new hello world VoiceXML application. In keeping with our naming scheme, we might save this files as http://www.myserver.com/helloworld/helloworld2.xml

Now you can use the Voxeo Account Manager to provision a number to your simple voice recognition application and call the associated number to hear (and speak!) the results.

Download the Code!

  Motorola source code


What we covered:




  ANNOTATIONS: EXISTING POSTS
ericj
7/27/2004 1:34 AM (EDT)
The copy of the source code contained in "Download the Code" is different from what is shown in the tutorial.  The "Download the Code" code is missing the <prompt> tag in front of each prompt listed in the if construct.

- ej
ericj
7/27/2004 2:24 AM (EDT)
The menu choices in the sample code are properly formatted in lower case (kenny cartman hanky et al).  However, the matching items in the If construct are incorrectly shown in initial caps.

<elseif cond="SouthParkCharacter == 'Hanky'"/>
        <prompt>Mister Hanky, the Christmas poo.</prompt>

should be

<elseif cond="SouthParkCharacter == 'hanky'"/>
        <prompt>Mister Hanky, the Christmas poo.</prompt>

in order to properly match the selection to the prompt.

steve.sax
7/27/2004 1:10 PM (EDT)
Eric,

Thanks for pointing out this inconsistency; I have corrected this in our internal docs, and you can expect this be reflected in our live docs within the next day or two.

Warm Regards,

Steve Sax
adrianysk
2/3/2005 12:55 AM (EST)
I have tried it but I have an error, the prompt does not finish reading "Please say your favorite South Park character's name." and it stops and starts capturing voice input from user. Say like..."Please say ........" then the recognition starts. This will give the user uncertainty what the question is all about. Why this may happen?
MattHenry
2/3/2005 1:51 AM (EST)
Hola adrian,

Theres any number of reasons this could happen, but without seeing debug logs containing the call info, I can only guess. A bad connection with static, or a noisy environment would be two likely candidates for this behavior. If you'd like to capture the debug logs, and create an account ticket with them attached, I can certainly take a look and offer my input.

~Matt
adrianysk
2/3/2005 2:07 PM (EST)
#5036

I am not sure how to capture the debug notes...as I am still new in the area. And the forum as well. However, the above is the ticket that I have created? Does that reflect anything?

Thanks Matt...
adrianysk
2/6/2005 5:37 AM (EST)
Greetings Matt,

I have already solved my problem. I added the following line to the <prompt> tag ...

    bargein="false"

<prompt bargein="false">


Then the prompt will be readed completely before the sure can continue with the recognition.

Thanks.
rrobertc
3/25/2005 9:43 AM (EST)
Why do you have [terrance phillip] inside []? The whole thing is already inside the OR[]. Also I don't see terrance or phillip in the if statments.
MattHenry
3/25/2005 11:39 AM (EST)
Hiya Robert,


Thaks for catching those typos. I'll see about correcting that just as soon as time permits.

~Matt
jlam
4/27/2006 7:45 PM (EDT)
Can someone explain to me why some tags close with the / at the front or the end of the < >

ie <repompt/> and </noimput>
MattHenry
4/27/2006 8:54 PM (EDT)


Hiya Eric,

Some XML tags dont neccessarily hold values, or attributes; therefore, they are self-closing, and have but one directive with no additional parameters. Other elements do hold user-defined values, and even attribute values that specify sub-directives to execute: At it's most basic:

<element attributename="att_value">
  element_valus
</element

Versus:

<element/>

The w3c XML, and even HTML specifications both have oodles of information on this topic, that really dive into this topic, if you are inclined to learn more.


Cheers,

~Matt
Khamyl
5/7/2006 4:08 PM (EDT)
Hallo!

What will be the value of "SouthParkCharacter" variable if I say one of [terrence fillip]? Can I assign to this element one specific valu e.g. "OneOfTerrenceAndFillip"?

Thanx
Michael.Book
5/7/2006 7:44 PM (EDT)
Hello Khamyl,

In this example, the value of 'SouthParkCharacter' would simply be the name of the character you say.  If you say "Terrance," the value will be 'terrance'.  If you say "Phillip," the value will be 'phillip'.

If you want to assign a specific value to the field name, simply use a slot value for your grammar entry.  For example, your grammar might look like this:

-------------------
<grammar scope="document" type="text/gsl">
  <![CDATA[ 
    .MYRULE
      [
        terrance {<mySlot "Terrance of the Terrance and Phillip show">}
        phillip {<mySlot "Phillip of the Terrance and Phillip show">}
      ]
  ]]>
</grammar>
-------------------

The link below is to a forum posting that has a good explanation on what values are assigned to a recognition field name, and even gives an example of how you could assign a specific slot value to a given grammar entry.  I would recommend taking a quick peak...

http://community.voxeo.com/bizblog/viewer?&bb-name=masterforum&bb-q=steelers&&bb-cid=10&bb-tid=80638#bb

I hope this helps...


Have Fun,

~ Michael
mmdoufu
8/18/2006 10:39 AM (EDT)
hey, howdy!

I really enjoyed playing with VXML. Keep up with your great work!

I just copied and pasted the South Park character and tested it, found that if I say 'big gay al' always returns

"  A match has occurred, but no specific if statement
    was written for it.  Probably just a minor character
    like Tweak or Jimbo's gun-toting friend."

I turned on the debug then realised that the grammar returns 'big gay al', not 'biggayal'. Therefore, the elseif statement should be written like

    <elseif cond="SouthParkCharacter == 'big gay al'"/>
    <prompt>Big Gay Al is gay.</prompt>

      <elseif cond="SouthParkCharacter == 'mister mackey'"/>
      <prompt>Mister Mackey.  Mmmmmmkay.</prompt>

Cheers,
ay
erik707kire
9/7/2006 7:25 AM (EDT)
I do not expect a free call to my application from Bulgaria, but what number should I use to call into my app. from there? The country code, plus the same free-phone number but without the 800???
Erik
MattHenry
9/7/2006 10:22 AM (EDT)


Hello Erik,

Might I suggest a cheaper alternative? You can always connect to your voxeo applications by using a VoIP service such as free world dialup, or skype. Alternatively, you can download a 2 port trail of our IVR platform software, and run it on your local machine:

http://www.voxeo.com/prophecy

Cheers,

~Matthew Henry
KMadhuka
11/22/2006 9:40 AM (EST)
In the above example there is no <prompt> tag before the sentence
"Please say your favorite South Park character's name"
How this will interpreted as a <prompt> to voice the sentence.

Thanks
Madhukar
VoxeoJoe
11/22/2006 11:47 AM (EST)
Hello Madhukar,

If your text is within a set of "prompt", "field", or "block" tags, it will get rendered as TTS.


Sincerely,

Joe Gallina
Voxeo Corporation
artybala
7/21/2007 1:06 AM (EDT)
I am trying to learn VoiceXML and when using the following code, the system says it has an internal error and could not locate the URL. it comes after it reads out the prompt. Can someone help me with what i am missing . Thanks.

<?xml version="1.0" encoding="UTF-8"?>

<vxml version="2.0">

<form id ="firstform">

<field name = "CallerKey">

  How are you today. Please give me your name

</field>

</form>

</vxml>
voxeojeremy
7/21/2007 8:40 AM (EDT)
artybala,


You should have a grammar and filled element with this type of script.  To view an example of this, please visit our XML Grammars tutorial, located at http://docs.voxeo.com/voicexml/2.0/t_21.htm .  Please let us know if you require additional assistance.


Thank you very much!

Jeremy McCall
Voxeo Extreme Support
artybala
7/22/2007 4:43 PM (EDT)
Thanks a lot Jeremy. I added the Grammer and it works fine now. what should i do if my grammer is in a database table? If i am storing employee names in a database table, how do i use that as grammer with out having to create and XML or the gsl type one.
Thanks for your help.
- artybala
voxeojeff
7/23/2007 11:37 AM (EDT)
Hello Artybala,

Unfortunately, you will need to create a new XML or GSL grammar file in order to accomplish this task.  For more information on grammars, you may visit the following links:

http://docs.voxeo.com/voicexml/2.0/t_21.htm
http://docs.voxeo.com/voicexml/2.0/mot_appendixi.htm
http://docs.voxeo.com/voicexml/2.0/mot_appendixj.htm

Best regards,

Jeff Menkel
Voxeo Corporation
hotk
3/10/2008 9:44 PM (EDT)
hi there.

When I call a number given to me, I can hear "~~~ gate way software."

What is problem??
VoxeoDustin
3/10/2008 10:11 PM (EDT)
Hey HotK,

It looks like your application is using vxml version="1.0" and a Nuance specific DTD. You'll want to change the version to 2.0 or 2.1, which is what our platform currently uses and remove the reference to the Nuance DTD.

Also, in the future, you can open account tickets if you have questions regarding our platform. When you login to Evolution, click Support Tickets and then Open New Account Ticket.

Thanks,
Dustin
anishjhaveri
6/24/2008 8:17 AM (EDT)
Hi,

Excellent article! Thanks for the same.

I would like to know one thing.

Can I have a simple user's voice input which can be converted to text? I mean speech-to-text.

Lets say I would like to make an application where I can take user's view on some question which will be inputted as voice. Can I convert that voice input to text?

Looking forward to your reply.

Thanks,
Anish

VoxeoDustin
6/24/2008 11:10 AM (EDT)
Hey Anish,

Automated speech-to-text is a very difficult task to accomplish. There are some companies and software out there that may suit your needs, and a quick Google search should get you a bevy of information regarding it.

You may want to look into manual transcription services, which may be a cheaper and easier route that integrating a speech-to-text API.

Let us know if we can be of further assistance.

Cheers,
Dustin

login

  tutorial Hello World  |  TOC  |  tutorial Call Flow   

© 2003-2008 Voxeo Corporation  |  Voxeo IVR  |  VoiceXML & CCXML IVR Developer Site