VoiceXML 2.1 Development Guide Home  |  Frameset Home

  tutorial Document Navigation   |  TOC  |  tutorial Call Transfer  

Tutorial: Using Audio Files


This Lesson is based on the things you accomplished in tutorials 1, 2, 3, and 4. If you have not completed those tutorials, you'll need to go through them first.

Step 1: record some audio files

Pre-recorded audio files take more time to setup than just using text-to-speech, but invaribaly sound more professional and polished than even the best TTS engine. Recording your own audio files is often preferred for commercially deployed applications.

So, in the interests of quality audio output, use your favorite sound recording utility to record two files that say:

You can use the Windows Sound Recorder to do this, as illustrated below. After you have recorded the files, save them in u-law format and call them "helloworld.wav" and "menu.wav". If you're using the Windows Sound Recorder, select "File" "Save as"..., press the "Change..." button, and change the attributes to "8,000 Hz, 8-Bit, Mono", as shown below:






Step 2: creating our initial VoiceXML structure

From our previous tutorials, we now recognize the following structure as a normal starting point:

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">


</vxml>



Step 3: create our grammar files


We already know how to make a grammar files, so this should be a piece of cake:

<![CDATA[
[
  [cat kitty kitten meow (cat person)]      { <CatOrDog "Cat">  } 
  [dog pooch puppy doggie (dog person)]    { <CatOrDog "Dog">  } 
]]]>


  <link next="">
    <grammar type="text/gsl">
            [(main ?menu)]         
      </grammar>
  </link>


Wait a minute, did we make mean to put that question mark ("?") in the grammar file? In fact, we did. That tells VoiceXML that the word is optional for determining matches. Remember, the parentheses tie multiple words together, thus the caller could say "main" or "main menu" and it will recognize the utterance as valid.


Step 4: creating our menu

Next we need to make our main menu. We are essentially replacing the text-to-speech portions with references to the audio files we recorded back in step 1. We will now insert our inline grammars and add in our newly recorded audio files:

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" >

  <link next="#MainMenu">
    <grammar type="text/gsl">
            [(main ?menu)]         
      </grammar>
  </link>

  <form id="MainMenu">
    <block>
      <audio src="helloworld.wav"/>
    </block>
   
    <field name="CatOrDog">
      <audio src="menu.wav"/>
      <grammar type="text/gsl">
        <![CDATA[[
            [cat kitty kitten meow (cat person)]      { <CatOrDog "Cat">  } 
            [dog pooch puppy doggie (dog person)]    { <CatOrDog "Dog">  } 
        ]]]>
      </grammar>
    </field>
  </form>


Look how simple that is. One little <audio> with a "src" attribute pointing to the audio file. Now your file will be played over the telephone instead of TTS. Notice that we have inherently scoped our 'Main Menu' grammar to the document level, (via the <link> tag), so that a caller will be able to say "main" at any time in the application in order to be transferred to the main menu. And our grammar for the 'CatOrDog' field has the default scope of 'dialog', so it will only be active in that particular field. Keep in mind, that attempts to explicitly scope grammars contained in either a <link> or <field> element is not permitted.

Now let's fill out the rest of our menu by adding some event handlers, and conditional logic to transition the caller to the appropriate destination based on whichever pet preference they specify:


<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">

<property name="universals" value="help"/>

  <link next="#MainMenu">
    <grammar type="text/gsl">
            [(main ?menu)]         
      </grammar>
  </link>

  <form id="MainMenu">
    <block>
      <audio src="helloworld.wav"/>
    </block>
 
    <field name="CatOrDog">
      <audio src="menu.wav"/>

      <grammar type="text/gsl">
      <![CDATA[[
            [cat kitty kitten meow (cat person)]      { <CatOrDog "Cat">  } 
            [dog pooch puppy doggie (dog person)]    { <CatOrDog "Dog">  } 
        ]]]>
      </grammar>
      <noinput>
      <prompt>
        I did not hear anything.  Please try again.
      </prompt>
        <reprompt/>
      </noinput>
 
      <nomatch>
      <prompt>
        I did not recognize that pet choice.  Please try again.
      </prompt>
      <reprompt/>
      </nomatch>
 
      <help>
      <prompt>
        Just say "Cat" or "Dog".
      </prompt>
        <reprompt/>
      </help>


    </field>
    <filled namelist="CatOrDog">
      <if cond="CatOrDog == 'Cat'">
        <goto next="#Cat"/>
      <elseif cond="CatOrDog == 'Dog'"/>
        <goto next="#Dog"/>
      </if>
    </filled>
  </form>
</vxml>


Whoa there. What is this <help> tag all about? Well, VoiceXML defaults to having the verbally spoken word "help" as always generating a match, but you need to use the <help> tag to incorporate it into your script. Like <nomatch> and <noinput>, <help> is an element that belongs in most well-coded voice recognition menus. When the caller says "help", the VoiceXML interpreter will execute what is inside the <help> segment of code. In this case, we tell them to say "cat" or "dog" and then return to the start of the menu (via the <reprompt> element). You'll also note that we need to explicitly enable the 'help' option via the <property> setting, as this feature is not enabled by default for VoiceXML 2.1 applications.

Now we can finish the rest of our application by inserting the submenus:


<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1">

<property name="universals" value="help"/>

  <link next="#MainMenu">
    <grammar type="text/gsl">
            [(main ?menu)]         
      </grammar>
  </link>

  <form id="MainMenu">
    <block>
      <audio src="helloworld.wav"/>
    </block>

    <field name="CatOrDog">
      <audio src="menu.wav"/>
      <grammar type="text/gsl">
        <![CDATA[[
            [cat kitty kitten meow (cat person)]      { <CatOrDog "Cat">  } 
            [dog pooch puppy doggie (dog person)]    { <CatOrDog "Dog">  } 
        ]]]>
      </grammar>

      <noinput>
      <prompt>
        I did not hear anything.  Please try again.
      </prompt>
        <reprompt/>
      </noinput>
 
      <nomatch>
      <prompt>
        I did not recognize that pet choice.  Please try again.
      </prompt>
      <reprompt/>
      </nomatch>
 
      <help>
      <prompt>
        Just say "Cat" or "Dog".
      </prompt>
    <reprompt/>
      </help>
    </field>
    <filled namelist="CatOrDog">
      <if cond="CatOrDog == 'Cat'">
        <goto next="#Cat"/>
      <elseif cond="CatOrDog == 'Dog'"/>
        <goto next="#Dog"/>
      </if>
    </filled>
  </form>
 
  <form id="Cat">
    <field name="BackToMain">
      Cats rule.  They are the superior lifeform on earth. 
      If you wish to try again, please say "Main".
    </field>
    <filled namelist="BackToMain">
    </filled>
  </form>

  <form id="Dog">
    <field name="BackToMain">
      Dogs.  One wonders how they became so popular...
      If you wish to try again, please say "Main".
    </field>
    <filled namelist="BackToMain">
    </filled>
  </form>
</vxml>



Step 5: upload, and try it out

All that remains now is to upload our new hello world VoiceXML application. In keeping with our naming scheme, we might save this file as http://www.myserver.com/helloworld/helloworld5.xml.

Now you can provision a number to your simple menuing application with built in help commands and call the associated number to hear the results. This time, you can listen to pre-recorded audio files instead of text-to-speech.

Download the Code!

  Motorola source code


What we covered:




  ANNOTATIONS: EXISTING POSTS
dgeiregat
10/13/2004 6:19 AM (EDT)
Hello,

I have 2 remarks for the coding example.

1) a property element needs to be added in order for the 'help' utterance to be recognized: <property name="universals" value="all"/>. I added it to the field CatOrDog, not at the global level. And it works!

2) Same remark as my 2nd remark on the CallFlow Tutorial: remove the condition on each of the forms <form id="Cat"/"Dog">. They are not needed.

Regards,

Dirk
Michael.Book
10/13/2004 11:10 AM (EDT)
Howdy Dirk,

Nice catch!  Thank you for your valuable feedback...

I have corrected the tutorial code.  Please allow a couple of days for the changes to be pushed out the live doc-set...


Thanks Again,

~ Michael
neelima.ch
1/31/2006 2:16 AM (EST)
Hello,
My doubt is, here in the following code:


<grammar type="text/gsl">
      <![CDATA[[
            [cat kitty kitten meow (cat person)]      { <CatOrDog "Cat">  } 
            [dog pooch puppy doggie (dog person)]    { <CatOrDog "Dog">  } 
        ]]]>
      </grammar>

What I want to know here is, when we give the voice input, how is that input stored. I mean which is holding the value or the voice input here, is it CatOrDog???or anything else. and also, does "Cat" and "Dog" refer to the respective forms which hold the response for the given voice input????

Regards,
Neelima.
raja_emmadi
1/31/2006 9:36 AM (EST)
Hello Neelima,

According to my understanding

[cat kitty kitten meow (cat person)]      { <CatOrDog "Cat">  }

the voice input can be : "cat", "kitty", "kitten", "meow", or "cat person"

If the voice input is from any of these the value 'Cat' is assigned to field "CatOrDog".

But I am not sure.

-- Rajesh
rajesh_thota_kumar@yahoo.com
Michael.Book
2/1/2006 2:06 PM (EST)
Howdy All,

Regarding the value assigned to a field name from a given recognition, there are some shadow variables that prove very useful when trying to visualize what's going on "under the covers" and when troubleshooting recognition related issues - 'confidence', 'inputmode', 'interpretation', and 'utterance'.  These are further explained at 'http://docs.voxeo.com/voicexml/2.0/mot_sessionvars.htm#start', and I strongly recommend including the following log lines in all <filled> blocks during development.  These will give instant insight as to what exactly was recognized and what value will be assigned to the field name.
__________________________

<filled>
  <log expr="'*** [field name] VALUE = ' + [field name] + ' ***'"/>
  <log expr="'*** INPUTMODE = ' + [field name]$.inputmode + ' ***'"/>
  <log expr="'*** CONFIDENCE = ' + [field name]$.confidence + ' ***'"/>
  <log expr="'*** UTTERANCE = ' + [field name]$.utterance + ' ***'"/>
  <log expr="'*** INTERPRETATION = ' + [field name]$.interpretation.[slot name] + ' ***'"/>
</filled>
__________________________

Now, the shadow variables most relevant to this specific thread are 'utterance' and 'interpretation'.  The utterance is the word or phrase, listed in an active grammar, that the end-user actually said (that was "matched").  The interpretation is the grammar slot value for said utterance (if there is one).  For instance:
__________________________

<grammar scope="document" type="text/gsl">
  <![CDATA[ 
    .MYRULE
      [
        yes {<mySlot "affirmative">}
      ]
  ]]>
</grammar>
__________________________
 
Given the above grammar, "yes" would be the utterance, and "affirmative" would be the interpretation of the slot named 'mySlot'.  So which gets assigned to the field name?

- If there is not a slot/interpretation value available, the utterance value will be assigned to the field name.  Example:
__________________________

<field name="champs">
  <grammar scope="document" type="text/gsl">
    [ (?seattle ?(sea hawks)) (?pittsburgh ?steelers) ]
  </grammar>

  <prompt>
    Who will win super bowl forty.
  </prompt>

  <filled>
    <log expr="'*** champs VALUE = ' + champs + ' ***'"/>
    <log expr="'*** UTTERANCE = ' + champs$.utterance + ' ***'"/>
    <log expr="'*** INTERPRETATION = ' + champs$.interpretation + ' ***'"/>
  </filled>
</field>
__________________________

- If a slot/interpretation value is available, but we do not explicitly specify the available slot's name in our <field> tag, the interpreter will assign the utterance value to the field name, *unless* a slot is present with the same name as the field itself.  Example:
__________________________

<field name="champs">
  <grammar scope="document" type="text/gsl">
    <![CDATA[ 
      .MYRULE
        [
          seattle {<champs "seahawks">}
          pittsburgh {<champs "steelers">}
        ]
    ]]>
  </grammar>

  <prompt>
    Who will win super bowl forty.
    Seattle or Pittsburgh.
  </prompt>

  <filled>
    <log expr="'*** champs VALUE = ' + champs + ' ***'"/>
    <log expr="'*** UTTERANCE = ' + champs$.utterance + ' ***'"/>
    <log expr="'*** INTERPRETATION = ' + champs$.interpretation,champs + ' ***'"/>
  </filled>
</field>
__________________________

- If a slot/interpretation value is available, and we have indeed explicitly specified that specific slot name in our <field> tag, that slot value will be assigned to the field name.  Example:
__________________________

<field name="champs" slot="mySlot">
  <grammar scope="document" type="text/gsl">
    <![CDATA[ 
      .MYRULE
        [
          seattle {<mySlot "seahawks">}
          pittsburgh {<mySlot "steelers">}
        ]
    ]]>
  </grammar>

  <prompt>
    Who will win super bowl forty.
    Seattle or Pittsburgh.
  </prompt>

  <filled>
    <log expr="'*** champs VALUE = ' + champs + ' ***'"/>
    <log expr="'*** UTTERANCE = ' + champs$.utterance + ' ***'"/>
    <log expr="'*** INTERPRETATION = ' + champs$.interpretation.mySlot + ' ***'"/>
  </filled>
</field>
__________________________


I hope these examples help to illustrate how field values are "filled."  Play around with them a bit; you'll see what I am talking about...


Have Fun,

~ Michael
pacific_is_me
2/6/2006 2:53 AM (EST)
How can I let vxml determine my .wav file, so it won't read my text by TTS engine ? Example: I have a text "Hello, World", and a "Hello World" wave file. If I make a control panel site and upload my wave file, the vxml file will process this wave file instead of reading the text. Thank you and sorry if my english is not good.
MattHenry
2/6/2006 11:35 AM (EST)

Hi there,

If you are looking to simply output an audio file, and keep TTS as a backup in the event that your wav file returns a '404' error, then you will want to use the following syntax:

<form>
<block>
  <prompt>
  <audio src="helloworld.wav">
  Hello world
  </audio>
  </prompt>
</block>
</form>

You may also find it useful to review the following links for additional clarification:

http://docs.voxeo.com/voicexml/2.0/frame.jsp?page=audioformats.htm
http://docs.voxeo.com/voicexml/2.0/audio.htm
http://docs.voxeo.com/voicexml/2.0/prompt.htm

~Matt
movomobile_prod
3/8/2006 10:54 AM (EST)
If I wanted to use an audio file (nomatch.wav) instead of TTS for:

<nomatch>
  <prompt>
    I did not recognize that pet choice.  Please try again.
  </prompt>
  <reprompt/>
</nomatch>

How could I go about doing this if the <audio> tag is not allowed inside a field element.  I thought about redirecting to another form but I didn't know if this was the correct method.
mikethompson
3/8/2006 1:11 PM (EST)
Hello there,

The audio tag actually *is* allowed with <field> as the direct parent element.  If you check out our documentation for the <audio> element here:

http://docs.voxeo.com/voicexml/2.0/frame.jsp?page=audio.htm

You will notice that there is a list of the specific child and parent elements for <audio>.  Notice <field> and <prompt> are both legitimate parents of <audio>.  In short, you could have your code snippet look as follows:

<nomatch>
  <prompt>
  <audio src="nomatch.wav">
    I did not recognize that pet choice.  Please try again.
  </audio>
  </prompt>
  <reprompt/>
</nomatch>

Hope this helps,
Mike Thompson
Voxeo Extreme Support
yousafriaz
4/24/2007 11:34 PM (EDT)
if i use coldfusion embedd with voicehml and save the file lets say at http://someip/application.cfm and want to use audio files how could i posibbly do that ? audio files just to greet / announcement not for holding values or any thing ?
jbassett
4/25/2007 4:29 AM (EDT)
Hello,

There would be no difference in the way you call the audio file. As long as you have valid XML code embedded in your document, you would call an audio file the same way you would in an .XML file.

Let me know if I did not understand you correcntly.

Jesse Bassett
Voxeo Support
yousafriaz
4/25/2007 7:47 PM (EDT)
jesse thanks for reply .

i am testing some application which is hosted at domain other then voxeo domain and i mentioned this in ACCOUNT > APPLICAION > SITE URL . http://mydomain.com/voice/test.cfm

now here is the code for the test.cfm (coldfusion) file

<?xml version="1.0" encoding="UTF-8"?>
<vxml version="2.1">

<var name="CallerID" expr="session.callerid"/>

<form id="form_Main">
<field name="digit1" type="digits">

  <block>
      <prompt>
  <audio src="GTracking.wav"/>
      </prompt>

</block>

<filled>
  <log expr="'*** FILLED ***'"/>
  <log expr="'*** digit1 =' + digit1 + '***'"/>
  <submit next="AddDigits.cfm" method="get" namelist="digit1 CallerID"/>
</filled>
</field>
</form>
</vxml>

now i am confused how to call this .wav file . i have uploaded it to login space with voxeo also and uploaded it to the directory also where my .cfm files is located . but still when i am running application could not find it ,

VoxeoTony
4/26/2007 12:01 AM (EDT)
Hello,

In looking at your question, we have to ask if you are looking for help with the ColdFusion portions of the code, or with locating your Wav file.  If you are asking how to get the audio file saved to your server, then you use the submit tag to send namelist values to your ColdFusion page and then use cfset to assign the values to a CF variable.  You may also consider FTPing the file to your server as long as you have permissions to do so.

If you would like more assistance with finding your wav files, we suggest setting up an account ticket as we would need log to assist you, and we prefer not sending that information on the public pages.

Tony~
Denell
12/6/2007 5:33 PM (EST)
In this code is the ** namelist="CatOrDog" ** needed? Can't the if statement be executed without that part?



<filled namelist="CatOrDog">
      <if cond="CatOrDog == 'Cat'">
        <goto next="#Cat"/>
      <elseif cond="CatOrDog == 'Dog'"/>
        <goto next="#Dog"/>
      </if>
</filled>
voxeojeremy
12/6/2007 7:56 PM (EST)
Hey there,


Yes, it could be executed without the namelist attribute, because there is only one field.  It is definitely a best practice to use namelist, though.  Please let us know if you have any other questions.  Happy testing!


Regards,

Jeremy McCall
Voxeo Extreme Support

login

  tutorial Document Navigation   |  TOC  |  tutorial Call Transfer  

© 2003-2008 Voxeo Corporation  |  Voxeo IVR  |  VoiceXML & CCXML IVR Developer Site