VoiceXML 2.1 Development Guide Home  |  Frameset Home

  tutorial Outbound VoiceXML Applications via HTTP  |  TOC  |  tutorial XML Grammars  

Tutorial: Mixed Initiative Dialogs

This tutorial is based off all you have accomplished in the previous lessons.  If you have not yet completed those tutorials, you will most likely need to go through them first.

It is now time to cover mixed-initiative dialogs, which is a slick way of saying "let the caller's freeform utterances dictate the application flow".  Well, not exactly...  In VoiceXML, there is not a way to allow true free-form utterances; however with cleverly designed grammars, we can fool the caller into thinking that they can say whatever they want and have the application respond accordingly.  Basically what mixed-initiative grammars do is allow the caller to fill in multiple voice recognition fields with a single utterance.

Typically, an application will prompt a caller many times when it needs multiple pieces of information.  One prompt for each field.  For large applications that require a lot of input, this can be a little cumbersome.  Mixed-initiative dialogs will allow the user to speak to the application like they are speaking to a real life person.  This creates a more user friendly environment that the caller will most likely appreciate as they will be able to do whatever it is they need to do in less time.

At this point, creating a simple form/field/filled voice recognition dialog should be second nature, so we won't cover the mechanics of those elements in any detail.  However, since mixed initiative dialogs use these elements and is much more complex, we will need to illustrate a few key differences:



Step 1: Creating the initial VoiceXML structure


As mentioned, we all know how to create a basic voice recognition dialog, so let's start from there.  We also know that we will need a form-scoped grammar as well, so we will add one of those at this point too.  Our initial framework will contain two forms.  The first will simply play a welcome message then move execution to the next form.  This will prevent the user from hearing the same welcome message a few times in a row if for some reason we don't match any utterances on the first try.  The second form will do all of the heavy lifting.  Here we will have a few fields to fill in each part of the data we need.  The main addition to this type of dialog is the <initial> element.


<?xml version="1.0" encoding="UTF-8" ?>
<vxml version="2.1"
      xmlns="http://www.w3.org/2001/vxml"
      xmlns:voxeo="http://community.voxeo.com/xmlns/vxml">

  <!-- Welcome Form -->
  <!-- Plays welcome message -->
  <form id="frm_welcome">
    <block>
      <prompt>
        Welcome to the dead celebrity hotline
      </prompt>
      <!-- Move to the main form -->
      <goto next="#frm_get_celeb" />
    </block>
  </form>

  <!-- Main Form -->
  <form id="frm_get_celeb">
    <!-- Load the grammar file -->
    <grammar src="mixed_init.srgs.xml" type="application/grammar-xml" root="CELEBRITY" />

    <!-- Prompt the user for info -->
    <initial name="init_get_celeb">
      <prompt>
        Please tell us which dead celebrity you saw, and where we can find him.
      </prompt>
    </initial>

    <!-- Ask the user for the celebrities name -->
    <field name="celebrity">
      <prompt>Which celebrity did you see?</prompt>
    </field>

    <!-- ask the user where the celeb was -->
    <field name="location">
      <prompt>And where was he sighted at?</prompt>
    </field>

    <!-- All of the fields have been matched -->
    <filled mode="all">
      <!-- Goodbye -->
      <exit />
    </filled>
  </form>
</vxml>


Much like Frank Sinatra's later performances, nothing really sophisticated here.  Our starting point has several noteworthy items:


Because our form scoped grammar is doing all of the work, we remove the <filled> elements from each field and place a <filled mode="all"> within our main form.  This will allow us to match *both* utterances before moving to the filled section.  The <initial> element is really the key ingredient here.  The initial element is what makes mixed initiative dialogs possible.  Execution will visit the initial section first at attempt to match both of our utterances.


Step 2: Authoring the subgrammars


Before we move forward, we should probably stop and take a look at the grammar we will be using.  Since we have two fields, which are two separate utterances, we will need to create a subgrammar for each utterance.  What makes this a little special is that we will need to match both utterances, or each utterance separately, and most likely, we will want to have some filler rules that wrap our utterances.  This is because human languages are very complex and our callers could use different words to say the same thing.  The filler rules are there as just that, filler.  These parts of the utterances will be discarded completely before being returned to the <filled> section.  Having the filler rules will significantly reduce the chance of having a <nomatch> condition met.


<?xml version= "1.0" encoding="UTF-8" ?>

<grammar xmlns="http://www.w3.org/2001/06/grammar"
        xml:lang="en-US">

  <!-- create the top level rule -->
  <!-- this grammar will fill both subgrammars and put each result into a fancy variable -->
  <!-- be sure to declare this as public so we can read it in the vxml script -->
  <rule id="CELEBRITY" scope="public">
    <!-- match one of the <item>'s between the <one-of> tags -->
    <one-of>
      <item>
        <!-- both grammars were matched mixed-init style -->
        <item><ruleref uri="#CELEB"/><tag>out.celebrity=rules.CELEB.celebrity;</tag></item>
        <item><ruleref uri="#LOCATION"/><tag>out.location=rules.LOCATION.location</tag></item>
      </item>
      <item>
        <!-- matched celeb only -->
        <ruleref uri="#CELEB"/>
      </item>
      <item>
        <!-- matched location only -->
        <ruleref uri="#LOCATION"/>
      </item>
    </one-of>
  </rule>

  <!-- Celebrity grammar -->
  <!-- we can use this by itself or mixed-init style -->
  <rule id="CELEB" scope="public">
    <one-of>
      <item>
        <!-- optional utterance prefixes go here -->
        <!-- Load the celebrity filler subgrammar -->
        <item repeat="0-1"> <!-- a filler match can be repeated once if desired -->
          <ruleref uri="#FILL_CELEB"/>
        </item>

        <!-- choose one of these celebrities -->
        <!-- return the info we need in out.celebrity -->
        <one-of>
          <item> elvis presley  <tag> out.celebrity = "Elvis Presley"; </tag></item>
          <item> buddy holly    <tag> out.celebrity = "Buddy Holly";  </tag></item>
          <item> john travolta  <tag> out.celebrity = "John Travolta"; </tag></item>
          <item> frank sinatra  <tag> out.celebrity = "Frank Sinatra"; </tag></item>
          <item> marty feldman  <tag> out.celebrity = "Marty Feldman"; </tag></item>
          <item> jim morrison    <tag> out.celebrity = "Jim Morrison";  </tag></item>
        </one-of>
      </item>
    </one-of>
  </rule>

  <!-- the location grammar -->
  <!-- same as celebrity grammar, just different info to match -->
  <rule id="LOCATION" scope="public">
    <one-of>
      <item>
        <!-- optional utterance prefixes go here -->
        <item repeat="0-1">
          <ruleref uri="#FILL_LOC"/>
        </item>
        <one-of>
          <item> flea market      <tag> out.location = "Flea Market";    </tag></item>
          <item> wrestling match  <tag> out.location = "Wrestling Match"; </tag></item>
          <item> las vegas        <tag> out.location = "Las Vegas";      </tag></item>
          <item> my bathroom      <tag> out.location = "My Bathroom";    </tag></item>
          <item> grace land        <tag> out.location = "Graceland";      </tag></item>
          <item> dairy queen      <tag> out.location = "Dairy Queen";    </tag></item>
        </one-of>
      </item>
    </one-of>
  </rule>

  <!-- filler grammar to prefix 'celeb' utterance' -->
  <rule id="FILL_CELEB" scope="public">
    <one-of>
      <item> i saw </item>
      <item> could swear it was </item>
      <item> it was </item>
      <item> holy moley </item>
    </one-of>
  </rule>

  <!-- filler grammar to prefix 'location' utterance' -->
  <rule id="FILL_LOC" scope="public">
    <one-of>
      <item> in the </item>
      <item> at the </item>
      <item> was at the </item>
      <item> ditty bopping at </item>
      <item> big as life near the </item>
      <item> looking fatter in person at </item>
      <item> hanging out at </item>
      <item> eating like a pig at </item>
    </one-of>
  </rule>
</grammar>


The above subgrammars should look pretty familiar.  The key points of this subgrammar structure should be immediately recognizable.  You'll note that within the "CELEBRITY" and "LOCATION" rulesets, we have a different slot defined for each.  These slots names will match the field names within the VoiceXML portion of our code.  You should also note that our "FILL_*" grammars do not have any return values specified.  This is because the are just that, filler.

Lastly, you will notice that the top-level rule determines where the filler utterances can occur in relation to our expected utterance from the caller.  Also, the top level grammar does not have a return value.  The returns are all handled within their respective subgrammars so we can attach them to the individual fields.


Step 3: The tricky stuff


Now that our grammar is completed, we can shift back to the XML.  We have our basic framework all layed out, but it still needs a bunch of additional code to actually make it do something.  Remember, we want the caller to be able to enter data all at once or enter each piece separately.  Since this is the case, we need to first account for what happens when the app gets an initial, partial match or a nomatch.  First we will set up the initial code.  The <initial> element handles the initial utterance from the user.  If it matches both fields, it will move execution to the <filled> section.  If it does not match, it will move to the <nomatch> section.  Since filled is somewhat self explanatory, we will go over some special code for the nomatch section.  The initial element has an attribute called "name" which will also create a variable with the same name as the value of the name attribute.  We can think of this variable as a switch.  When we set the name variable to "true", the form will use the two individual fields instead of the initial section.  This allows us to ask the user for input in separate pieces if they can't match the mixed initiative style grammar from the initial section.  For example:


<initial name="init_get_celeb">
  <prompt>
    Please tell us which dead celebrity you saw, and where we can find him.
  </prompt>

  <!-- User did not say what we needed -->
  <!-- after the first failure -->
  <nomatch count="1">
    <prompt>
      Okay, I'll ask you for information one piece at a time.
    </prompt>

    <!-- set the initial elements name to true -->
    <!-- this will make the app prompt the user for each piece of information -->
    <assign name="init_get_celeb" expr="true" />

    <!-- start over -->
    <reprompt/>
  </nomatch>
</initial>


You will no doubt see the crucial additions to our code.  We now have a fancy <nomatch> section that will be triggered only once, and it will disable our mixed initiative input mode.  After this section is met, the user will have to enter information one piece of information at a time because we set "init_get_celeb" (the initial section's name) to true.  Also, it is worth noting that we have not mapped any fields to the grammar slots.  This is because the individual subgrammars have the same names as our fields, and our form was told which grammars to use at the form level.


Step 4: Finishing it up


Only a few more things to do, YAY!  Now that we have handled the user utterances, we should probably do something with all of that fancy data.  Remember the <filled mode="all">?  Well, execution will now move to the filled section and since "all" was specified, we have two variables available: "celebrity" and "location" filled with their respective data returned from our grammar.  What we are going to do with the data is provide some custom messages that will be played to the user based on their input.


<!-- Ask the user for the celebrities name -->
<field name="celebrity">
  <prompt>Which celebrity did you see?</prompt>
</field>

<!-- ask the user where the celeb was -->
<field name="location">
  <prompt>And where was he sighted at?</prompt>
</field>

<!-- All of the fields have been matched -->
<filled mode="all">
  <!-- John Travolta is the best actor ever -->
  <if cond="celebrity == 'John Travolta'">
    <prompt>
      True, John Travolta is still alive, but his career sure isn't.
    </prompt>
  </if>

  <!-- play the results to the user -->
  <prompt>
    Thank you for your report.
    A team armed with butterfly nets and
  </prompt>

  <!-- Customize the response depending on which celeb was matched -->
  <if cond="celebrity == 'Elvis Presley'">
    <prompt>
      a peanut butter and banana sandwich
    </prompt>
  <elseif cond="celebrity == 'Buddy Holly'" />
    <prompt>
      a pair of contact lenses
    </prompt>
  <elseif cond="celebrity == 'John Travolta'" />
    <prompt>
      a script for battlefield earth, part two
    </prompt>
  <elseif cond="celebrity == 'Frank Sinatra'" />
    <prompt>
      a double martini, and two teenage girls
    </prompt>
  <elseif cond="celebrity == 'Marty Feldman'" />
    <prompt>
      an all you can eat shrimp platter
    </prompt>
  <elseif cond="celebrity == 'Jim Morrison'" />
    <prompt>
      L S D
    </prompt>
  </if>

  <!-- put the location into the mix -->
  <prompt>
    for bait will be deployed by helicopter to
    <value expr="location" />
    in an attempt to recapture
    <value expr="celebrity" />
    <break time="1000ms" />
    Goodbye
  </prompt>

  <!-- Goodbye -->
  <exit />
</filled>



I hope that you have enjoyed the mixed initiative tutorial!  By this time, you should be very comfortable with mixed initiative concepts, use of the <initial> element and how to implement form level grammars.  If you want to try this sample code out for yourself, feel free to download the code below.

Download the Code!


  Source Code


  ANNOTATIONS: EXISTING POSTS
zweit
1/5/2006 12:24 PM (EST)
Hi,
After I have changed the value of "count" to "2" in <nomatch> tag and seen the effect, I understood what role "count" played, or have you mentioned it in the former lessons?

regard
-------
Secondt
Michael.Book
1/5/2006 6:50 PM (EST)
Howdy Secondt,

The tutorials, in the interest of brevity, simply do not detail every element and attribute used in the sample code.  However, this VoiceXML documentation set does include handy breakdowns of each VoiceXML element and its respective attributes.  Just find the element that has sparked your curiosity under the "ELEMENTS" menu at the bottom of the frame on the left, and give it a click.  Detailed goodness awaits...

I hope this helps...


Have Fun,

~ Michael
sambhav
2/10/2010 12:28 PM (EST)
Hello

I am developing an application where number of fields is more than two in a form. say four.

Can anybody suggest how I can achieve mixed initiative approach in that case? I want to ensure that user can fill one,two,three or all four attribute at a time.

Thanks a lot :-)
voxeoJohn
2/10/2010 12:43 PM (EST)
Hello,

  We would be glad to help in this regard, have you had a chance to give the above tutorial a try? Are you having an issue getting this to work? If you are we ask that you please submit debugger output (via our support ticketing system) so we can offer our assistance getting your application to work.  We'll be standing by for your update at this time.

Regards,

John Dyer
Customer Engineer
Voxeo Support
sambhav
2/11/2010 5:16 AM (EST)
Hello

I worked the above example with four fields.
Here is the code.
***Grammar File***
<?xml version= "1.0" encoding="UTF-8" ?>

<!--
    Whole grammar in a readable form
    1. ROOT -> COLOR FRUIT FLOWER CUISINE   
              |COLOR_FRUIT
              |FRUIT_FLOWER
              |FLOWER_CUISINE
              |COLOR
              |FRUIT
              |FLOWER
              |CUISINE
           
    2. COLOR_FRUIT -> COLOR FRUIT
    3. FRUIT_FLOWER -> FRUIT FLOWER
    4. FLOWER_CUISINE -> FLOWER CUISINE
   
    5. COLOR -> color
    6. FRUIT -> fruit
    7. CUISINE -> cuisine
   
    /*
        This grammar suggests that caller can provide all four field in one go or just two of them, or one at a time
        Number of combinations of providing fields can be higher, but here we have simplified our need
    */
-->

<grammar xmlns="http://www.w3.org/2001/06/grammar" xml:lang="en-US">
    <rule id="ROOT" scope="public">
        <one-of>

                <!-- matched color followed by fruit flower and cuisine -->
            <item>
                <item>
                    <ruleref uri="#COLOR"/>
                    <tag>out.color=rules.COLOR.color;</tag>
                </item>
                <item>
                    <ruleref uri="#FRUIT"/>
                    <tag>out.fruit=rules.FRUIT.fruit</tag>
                </item>
                <item>
                    <ruleref uri="#FLOWER"/>
                    <tag>out.flower=rules.FLOWER.flower</tag>
                </item>
                <item>
                    <ruleref uri="#CUISINE"/>
                    <tag>out.cuisine=rules.CUISINE.cuisine</tag>
                </item>
            </item>

            <!-- matched color followed by fruit -->
            <item>
                <ruleref uri="#COLOR"/>
                <tag>out.color=rules.COLOR.color;</tag>
                <ruleref uri="#FRUIT"/>
                <tag>out.fruit=rules.FRUIT.fruit;</tag>
            </item>

            <!-- matched fruit followed by flower -->
            <item>
                <ruleref uri="#FRUIT"/>
                <tag>out.fruit=rules.FRUIT.fruit;</tag>
                <ruleref uri="#FLOWER"/>
                <tag>out.flower=rules.FLOWER.flower;</tag>
            </item>

            <!-- matched flower followed by cuisine -->
            <item>
                <ruleref uri="#FLOWER"/>
                <tag>out.flower=rules.FLOWER.flower;</tag>
                <ruleref uri="#CUISINE"/>
                <tag>out.cuisine=rules.CUISINE.cuisine</tag>
            </item>
           
            <item>
        <!-- matched color only -->
                <ruleref uri="#COLOR"/>
            </item>
            <item>
        <!-- matched fruit only -->
                <ruleref uri="#FRUIT"/>
            </item>
            <item>
        <!-- matched flower only -->
                <ruleref uri="#FLOWER"/>
            </item>
            <item>
        <!-- matched cuisine only -->
                <ruleref uri="#CUISINE"/>
            </item>
        </one-of>
    </rule>

  <!-- Color grammar -->
  <!-- we can use this by itself or mixed-init style -->
    <rule id="COLOR" scope="public">
        <one-of>
            <item>
        <!-- optional utterance prefixes go here -->
        <!-- Load the color filler subgrammar -->
                <item repeat="0-1"> <!-- a filler match can be repeated once if desired -->
                    <ruleref uri="#FILL_COLOR"/>
                </item>

        <!-- choose one of these colors -->
        <!-- return the info we need in out.color -->
                <one-of>
                    <item> red
                        <tag> out.color = "red"; </tag>
                    </item>
                    <item> blue
                        <tag> out.color = "blue";  </tag>
                    </item>
                    <item> green
                        <tag> out.color = "green"; </tag>
                    </item>
                    <item> yellow
                        <tag> out.color = "yellow"; </tag>
                    </item>
                    <item> violet
                        <tag> out.color = "violet"; </tag>
                    </item>
                    <item> magenta
                        <tag> out.color = "magenta";  </tag>
                    </item>
                </one-of>
            </item>
        </one-of>
    </rule>

  <!-- the fruit grammar -->

    <rule id="FRUIT" scope="public">
        <one-of>
            <item>
        <!-- optional utterance prefixes go here -->
                <item repeat="0-1">
                    <ruleref uri="#FILL_FRUIT"/>
                </item>
                <one-of>
                    <item> mango
                        <tag> out.fruit = "mango";    </tag>
                    </item>
                    <item> banana
                        <tag> out.fruit = "banana"; </tag>
                    </item>
                    <item> apple
                        <tag> out.fruit = "apple";      </tag>
                    </item>
                    <item> strawberry
                        <tag> out.fruit = "strawberry";    </tag>
                    </item>
                </one-of>
            </item>
        </one-of>
    </rule>

    <rule id="FLOWER" scope="public">
        <one-of>
            <item>
        <!-- optional utterance prefixes go here -->
                <item repeat="0-1">
                    <ruleref uri="#FILL_FLOWER"/>
                </item>
                <one-of>
                    <item> rose
                        <tag>out.flower = "rose";    </tag>
                    </item>
                    <item> lily
                        <tag>out.flower = "lily"; </tag>
                    </item>
                    <item> lotus
                        <tag>out.flower = "lotus";      </tag>
                    </item>
                </one-of>
            </item>
        </one-of>
    </rule>

    <rule id="CUISINE" scope="public">
        <one-of>
            <item>
        <!-- optional utterance prefixes go here -->
                <item repeat="0-1">
                    <ruleref uri="#FILL_CUISINE"/>
                </item>
                <one-of>
                    <item> indian
                        <tag>out.cuisine = "indian";    </tag>
                    </item>
                    <item> chinese
                        <tag>out.cuisine = "chinese"; </tag>
                    </item>
                    <item> italian
                        <tag>out.cuisine = "italian";      </tag>
                    </item>
                    <item> mexican
                        <tag>out.cuisine = "mexican";    </tag>
                    </item>
                </one-of>
            </item>
        </one-of>
    </rule>

  <!-- filler grammar to prefix 'color' utterance' -->
    <rule id="FILL_COLOR" scope="public">
        <one-of>
            <item> i like </item>
            <item> i love </item>
            <item> my favourite color is </item>
            <item> my color is </item>
        </one-of>
    </rule>
    <!-- filler grammar to prefix 'fruit' utterance' -->
    <rule id="FILL_FRUIT" scope="public">
        <one-of>
            <item repeat="0-1"> and </item>
            <item> i like </item>
            <item> i love </item>
            <item> my favourite fruit is </item>
        </one-of>
    </rule>
    <!-- filler grammar to prefix 'flower' utterance' -->
    <rule id="FILL_FLOWER" scope="public">
        <one-of>
            <item repeat="0-1"> and </item>
            <item> i like </item>
            <item> i love </item>
            <item> my favourite flower is </item>
        </one-of>
    </rule>

  <!-- filler grammar to prefix 'cuisine' utterance' -->
    <rule id="FILL_CUISINE" scope="public">
        <one-of>
            <item repeat="0-1"> and </item>
            <item> i like </item>
            <item> i love </item>
            <item> my favourite cuisine is </item>
        </one-of>
    </rule>
</grammar>




***Voicepage***

<%@ page session="false" %>
<?xml version="1.0" encoding="UTF-8" ?>
<vxml version="2.1">   
    <form id="experiment">
        <grammar src="../grammars/four_variables.xml#ROOT" type="application/grammar-xml"/>
        <initial name="initial_prompt">
            <prompt bargein="false">
                What is your favourite color, fruit, flower and cuisine?
            </prompt>
            <nomatch count="1">
                Okay, I'll ask you for information one piece at a time.
                <assign name="myInit" expr="true"/>
                <reprompt/>
            </nomatch>

            <noinput>
                <prompt bargein="false">
                    I could not hear anything.Lets do it again.
                </prompt>
                <reprompt/>
            </noinput>
        </initial>

        <field name="color">
            <prompt bargein="false">What is your favourite color?</prompt>
            <grammar src="../grammars/four_variables.xml#COLOR" type="application/grammar-xml"/>
        </field>


        <field name="fruit">
            <prompt bargein="false">And what is your favourite fruit?</prompt>
            <grammar src="../grammars/four_variables.xml#FRUIT" type="application/grammar-xml"/>
        </field>

        <field name="flower">
            <prompt bargein="false">And flower?</prompt>
            <grammar src="../grammars/four_variables.xml#FLOWER" type="application/grammar-xml"/>
        </field>

        <field name="cuisine">
            <prompt bargein="false">finally, your cuisine will be?</prompt>
            <grammar src="../grammars/four_variables.xml#CUISINE" type="application/grammar-xml"/>
        </field>

        <filled mode="any">
            <prompt bargein="false">
                your favourite color is <value expr="color"/>, fruit is <value expr="fruit"/>, flower is <value expr="flower"/> and cuisine is <value expr="cuisine"/>
            </prompt>
        </filled>
    </form>
</vxml>

*** code ends here ***

The above code is working fine.I can say all four fields in one go or two at a time as suggested by grammar.If i provide two fields, i will asked for remaining two fields in two different prompts.

Problem:- If i provide two attributes, I should be able to provide  remaining two attributes in one go.

How can I achieve that?

Thanks :-)
voxeoJeffK
2/11/2010 6:20 AM (EST)
Hello,

This would be the normal, expected flow. The user's inputs are taken, and then the FIA chooses the empty fields to visit so that the correct prompting can drive the remaining inputs.

It may be possible through some complex use of multiple <initial> blocks by selectively clearing their form item variable, cond or expr. The difficulty lies in that you have to account for each of the field input items. Which ones were filled, and which were not? If you go back to a single <initial> then you will get prompted for all of them! So for four choices you would need 24 <initial> blocks.

Is it that you are required to capture all the remaining input items, or was there a misunderstanding concerning the behavior of <initial> in mixed dialogs?

Regards,
Jeff Kustermann
Voxeo Support
sambhav
2/18/2010 4:25 AM (EST)
Hello

I need to capture remaining fields.Number of remaining fields at any time in a conversation will be decided by caller's earlier responses.
I am clear with <initial> tag's behavior.

Thanks :-)

login
  tutorial Outbound VoiceXML Applications via HTTP  |  TOC  |  tutorial XML Grammars  

© 2010 Voxeo Corporation  |  Voxeo IVR  |  VoiceXML & CCXML IVR Developer Site