VoiceXML 2.1 Development Guide Home  |  Frameset Home

  tutorial Outbound VoiceXML Applications via HTTP  |  TOC  |  tutorial XML Grammars  

Tutorial: Mixed Initiative Dialogs

This tutorial is based on the things you accomplished in previous Lessons. If you have not completed those tutorials, you'll need to go through them first, else the support folks will likely make fun of you when your code doesn't work.

We will be covering mixed-initiative dialogs in this tutorial, which is a slick way of saying 'let the caller's freeform utterances dictate the application flow'.  Okay, okay... this is a big fat lie. Bear in mind there is no solution in VXML allowing true, free-form utterances; however, with careful prompting and cleverly designed grammar files, we can fool the caller into thinking that there is. To demystify this, all a mixed-initiative dialog does is allows a single caller utterance to fill multiple voice recognition fields.

At this point, we all know how to construct a simple form/field/filled voice reco dialog, so we won't cover the mechanics of this in great detail; however, since we are creating a mixed initiative dialog, which is much more complex, there are a few key differences that we need to take into account:



Step 1: creating our initial VoiceXML structure

Been there, done that. As mentioned, we have all done the simple voice reco field applications, so let's start from there. We also know that we will need a form-scoped grammar, as well, so our framework will look like this:


<?xml version="1.0" encoding="UTF-8" ?>

<vxml version="2.1">
  <meta name="maintainer" content="youremail@here.com"/>

<form id="DeadCelebrities">
  <grammar src="Celeb.grammar#CELEBRITY" type="text/gsl"/>


  <initial name="myInit">
    <prompt>
      Welcome to the dead celebrity hotline.
      Please tell us which dead celebrity you saw, and where we can find him.
    </prompt>
  </initial>


  <field name="celebrity">
    <prompt>
      Which celebrity did you see?
    </prompt>
  </field>


  <field name="location">
    <prompt>
      And where was he sighted at? 
    </prompt>
  </field>


  <filled mode="all">
  </filled>

</form>
</vxml>



Much like Frank Sinatra's later performances, nothing really sophisticated here.  Our starting point has several noteworthy items: (1) we have the aforementioned form-level grammar, and (2) we have neglected to add a <filled> statement at the <field> level. Remember, because our form scoped grammar is really doing all the work, it is appropriate that we also scope the <filled> element to the form level as well.  Also, we specified the <meta> tag in our code at the very top, which all celebrity developers do, be they dead or alive. As you all know, specifying this element in your XML document will allow a debug email to be sent to you in the event of an application error, which will inform you of what went wrong, and how to fix it.  This automated process even works from beyond the grave.

You will also note we have specified the mode of "all" -- this ensures our handler is only kicked off when *both* fields have been satisfied... The importance of this will become readily apparent as we progress through this tutorial.


Step 2: authoring our subgrammar

Before we go too much further, it would probably be a good idea to take a look at the grammar file we will be using. As we have two fields (and really two "separate" utterances we are expecting from the caller), we will need a subgrammar with two separate rules. Also, we need the ability to return either one rule at a time, or both rules at once, so this means we will need a special subgrammar rule setup as well. It is also likely we will need to add a rule for ambiguous "filler" speech we can recognize, and then discount entirely (as opposed to us NOT having a 'filler' grammar defined, and thereby increasing our chances for a 'nomatch' condition), so this means yet another rule will need to be defined. So many rules!  With the complexities of human speech in mind, our grammar file will look like this:


CELEBRITY
[
(?FILLER CELEB:a ?FILLER LOCATION:b)             
{<celebrity $a.celebrity>  <location $b.location>  }

(?FILLER CELEB:a )             
{<celebrity $a.celebrity>  }

(?FILLER LOCATION:b)             
                    { <location $b.location>  }
]


CELEB
[
  [(elvis ?the ?pelvis ?presley) (the king)]         
                                                            {<celebrity "Elvis Presley">}
  (buddy holly)                                       
                                                            {<celebrity "Buddy Holly">}
  [(john travolta) (vinnie barbarino) (vinnie vega)]
                                                            {<celebrity "John Travolta">}
  [(frank sinatra) (old blue eyes)]
                                                              {<celebrity "Frank Sinatra">}
  [(marty feldman) (old cross eyed)]
                                                            {<celebrity "Marty Feldman">}
  [(jim morrison) (?the lizard king)]
                                                            {<celebrity "Jim Morrison">}
]

LOCATION
[
  (flea market )
                                                            {<location "Flea Market">}
  (wrestling match )
                                                            {<location "Rasslin Match">}
  (las vegas )
                                                            {<location "Vegas">}
  (my bathroom )
                                                            {<location "My Bathroom">}
  (grace land )
                                                            {<location "Graceland">}
  (dairy queen )
                                                            {<location "Dairy Queen">}
(france)
                                                            {<location "France">}
]


FILLER
[
; filler grammar to prefix 'celeb' utterance
(i saw)
(could swear it was)
(it was)
(holy moley)

; filler grammar to prefix the 'location' utterance
(in ?the)
(ditty bopping along ?near ?at)
(big as life ?near ?at)
( looking ?shorter ?uglier ?insane in person ?at)
(hanging out ?at ?near)
(eating like a pig ?at ?in ?near ?(behind back))
]


Unless you skipped the subgrammar tutorial, this sort of stuff should look pretty familiar. If you did skip the tutorial on subgrammars, then you are probably as lost as Elvis Presley on a "heavy medication" day, so we suggest you go back and complete the earlier tutorial before going any further. For the rest of you, the key points in this subgrammar structure should be immediately recognizable...

You'll note that within the "CELEBRITY" and "LOCATION" rulesets, we have a different slot defined for each -- these slots line up with the individual fieldnames in our VXML code. This is no accident. You'll also note our "FILLER" subgrammar does NOT have any kind of return values specified. Who needs 'em, anyhow? All we are concerned with is the celebrity name, and where a quick-witted citizen spotted the celebrity in question, (for recapture purposes), so the "FILLER" stuff can cheerfully be tossed out the window when we return the grammar values.

Lastly, you will notice the top-level rule determines where the filler utterances can occur in relation to our expected utterance string from the caller. Those with keen eyes will also realize there is no top-level interpretation return, as these returns are already specified in the "CELEBRITY" and "LOCATION" subgrammars themselves. In short, we now have a pretty well thought out subgrammar for our dead celebrity hotline. As the Chairman Of The Board himself would say, "Ring-A-Ding, baby!"



Step 3: the tricky stuff

Now that we have our grammar file completed, and out of the way, we can return our focus to the XML file itself. We started off pretty good, but now we need to add a few bells and whistles to make our mixed-initiative dialog run how we want it to. Remember, we want to allow a caller to enter in both a celebrity name and a location all at once, or separately. Since this is the case, we need to first account for what happens when the app gets an initial (pun unintended), partial match (one field is filled), or a complete nomatch. Therefore, we will need to set a new variable, matching up with the initial name, and set this value to "true," forcing the Form Interpretation Algorithm to skip this form item, and head straight for the individual fields. As we do not want the caller to be stuck in limbo should such an event occur, it's probably a good idea to add a <reprompt> element into the mix, as well. Let us take a peek at this, along with a few other curve balls that have been added into our existing code:


<?xml version="1.0" encoding="UTF-8" ?>

<vxml version="2.1">
<meta name="maintainer" content="youremail@here.com"/>

<form id="DeadCelebrities">
  <grammar src="Celeb.grammar#CELEBRITY" type="text/gsl"/>


  <initial name="myInit">
    <prompt>
      Welcome to the dead celebrity hotline.
      Please tell us which dead celebrity you saw, and where we can find him.
    </prompt>

    <nomatch count="1">
    <prompt>
      Okay, I'll ask you for information one piece at a time.
    </prompt>
        <assign name="myInit" expr="true"/>
        <reprompt/>
    </nomatch>
  </initial>

  <field name="celebrity">
    <prompt>Which celebrity did you see?</prompt>
    <grammar src="Celeb.grammar#CELEB" type="text/gsl"/>
  </field>

  <field name="location">
    <prompt>And where was he sighted at?</prompt>
    <grammar src="Celeb.grammar#LOCATION" type="text/gsl"/>
  </field>


  <filled mode="all">
  </filled>

</form>
</vxml>


You will no doubt see all the crucial additions to our code have been highlighted for your ease of viewing. If you pay special attention to the grammar references we have within the fields, you will see we have NOT linked to the top-level subgrammar rule. Quoi? Remember, the top level return was designed to allow the caller to complete both utterances at once, and if a caller visits individual voice reco fields, then we must logically allow for but one utterance at a time.



Step 4: Add Some Yuks

Not so fast there, buddy. The code we laid out above in step 3 is almost done, and for the most part workable. But it lacks pizzazz...  and it is not especially funny. Therefore, we enlist the comedy stylings of Carrot Top to... oh... we wanted it funny. There will be no Carrot Top, sorry for the confusion. Instead, we continue with the Voxeo tradition of mocking celebrities who really have it coming, and we insert some barbs that might make our friends crack up (but might also anger several of Sinatra's ex-wives. Or the Mafia):


<?xml version="1.0" encoding="UTF-8" ?>

<vxml version="2.1">
  <meta name="maintainer" content="youremail@here.com"/>

<form id="DeadCelebrities">

  <grammar src="Celeb.grammar#CELEBRITY" type="text/gsl"/>

  <initial name="myInit">
    <prompt>
      Welcome to the dead celebrity hotline.
      Please tell us which dead celebrity you saw, and where we can find him.
    </prompt>

    <nomatch count="1">
      Okay, I'll ask you for information one piece at a time.
        <assign name="myInit" expr="true"/>
        <reprompt/>
    </nomatch>
  </initial>

  <field name="celebrity">
    <prompt>Which celebrity did you see?</prompt>
    <grammar src="Celeb.grammar#CELEB" type="text/gsl"/>
  </field>

  <field name="location">
    <prompt>And where was he sighted at?</prompt>
    <grammar src="Celeb.grammar#LOCATION" type="text/gsl"/>
  </field>


<filled mode="all">

  <if cond="celebrity == 'John Travolta'">
  <prompt>
    True, John Travolta is still alive, but his career sure isnt.
  </prompt>
  </if>

    <prompt>
    Thank you for your report.
    A team armed with butterfly nets and
  </prompt>

  <if cond="celebrity=='Elvis Presley'">
    <prompt>
      a peanut butter and banana sandwitch
    </prompt>

  <elseif cond="celebrity =='Buddy Holly'"/>
    <prompt>
      a pair of contact lenses
    </prompt>

  <elseif cond="celebrity =='John Travolta'"/>
    <prompt>
      a script for battlefield earth part two
    </prompt>


  <elseif cond="celebrity =='Frank Sinatra'"/>
    <prompt>
    a double martini, and two teenage girls
    </prompt>

  <elseif cond="celebrity =='Marty Feldman'"/>
    <prompt>
    an all you can eat shrimp platter
    </prompt>

  <elseif cond="celebrity =='Jim Morrison'"/>
    <prompt>
      L S D
    </prompt>

  </if>


  <prompt>
    for bait will be deployed by helicopter to
    <value expr="location"/>
    in an attempt to recapture
    <value expr="celebrity"/>
  </prompt>

  </filled>


</form>
</vxml>



Download the Code!


  Motorola Source Code

What we covered




  ANNOTATIONS: EXISTING POSTS
zweit
1/5/2006 12:24 PM (EST)
Hi,
After I have changed the value of "count" to "2" in <nomatch> tag and seen the effect, I understood what role "count" played, or have you mentioned it in the former lessons?

regard
-------
Secondt
Michael.Book
1/5/2006 6:50 PM (EST)
Howdy Secondt,

The tutorials, in the interest of brevity, simply do not detail every element and attribute used in the sample code.  However, this VoiceXML documentation set does include handy breakdowns of each VoiceXML element and its respective attributes.  Just find the element that has sparked your curiosity under the "ELEMENTS" menu at the bottom of the frame on the left, and give it a click.  Detailed goodness awaits...

I hope this helps...


Have Fun,

~ Michael

login

  tutorial Outbound VoiceXML Applications via HTTP  |  TOC  |  tutorial XML Grammars  

© 2003-2008 Voxeo Corporation  |  Voxeo IVR  |  VoiceXML & CCXML IVR Developer Site