VoiceXML 2.1 Development Guide Home  |  Frameset Home

  Token Initiated Calls  |  TOC  |  ANI and DNIS  

Answering Machine Detection

Voxeo has recently added support for advanced answering machine detection, known as the Call Progress Analyzer, (or ‘CPA’, for those into brevity), which is enabled on the VoiceXML Staging and Production Platforms. The CPA uses advanced Digital Signal Processing, (DSP) and voice activity detection to analyze the audio signal after a call is connected programmatically making it possible to determine if the answering party is a human speaker, an answering machine, or even a FAX. Note that the same dialing code rules apply for this feature: Staging Center applications will require the dial prefixes, while Production Center applications will not.


Call Progress Analyzer Variables

cpa.version
This currently should always be cpa.version=2.0

voxeo-cpa-maxsilence
The cpa.maxsilence variable is the user-defined variable which designates the amount of time that the CPA should wait before it stops the human vs. machine analysis and returns the voxeo-cpa-result. The setting for this variable is best set between 800 and 1200ms.

voxeo-cpa-maxtime
The user-defined cpa.maxtime variable specifies the threshold that the CPA should compare to the duration of the called party's uninterrupted speech.  If the duration of the callee's speech is below the cpa.maxtime threshold, then a 'human' result is returned.  If the duration exceeds the cpa.maxtime threshold, a result of ‘machine’ will be returned to the application.  The ideal setting of this variable should be in the range between  4000 to 7000ms.  Note that care must be taken when assigning the value to this variable, for if the callee’s initial utterance is longer than the maxtime value, the resultant value will be returned as ‘machine’.

voxeo-cpa-runtime
The user defined cpa.runtime variable is set to the desired length of time in which the analyzer should work before returning the final result. We will want to make certain that the time value set in this variable is longer than the added cpa-maxtime and cpa-maxsilence values, as well as setting this value to be longer than the ‘average’ length of an answering machine message. Most answering machine messages are no longer than 15000-20000ms in length, which would be a good range of values to insert for this variable.

voxeo-cpa-maskstop/maskevent
This variable lets the developer specify which results can be recognized and received. Should you want all possible types available, you may define the value as '*'. otherwise, you may specify a comma-separated list of events to mask, and which ones you wish to stop CPA processing upon receiving. the possible values for this parameter are:
voxeo-cpa-maskstop/maskevent
This variable lets the developer specify which results can be recognized and received. Should you want all possible types available, you may define the value as '*'. otherwise, you may specify a comma-separated list of events to mask, and which ones you wish to stop CPA processing upon receiving. the possible values for this parameter are:

voxeo-cpa-result
This variable holds the resultant value of the human vs. machine analysis, which we will want to pull from the querystring using a server-side language. Note that this parameter is a result parameter, and should not be set in the token initiation string.The possible results are:

wavurl
This wav file will be played as soon as CPA is done.  wavurl should be a small 'hello' type file, to alert the caller that something is happening.  NOTE:  The entire audio file will play before transitioning to your document.



CPA POST Example

First off, let's take a look at how our http token initiated querystring will look like with our nifty CPA parameters added in. As you know, we need to POST to api.voxeo.net to kick off the call like normal, but we are also including our aforementioned CPA variables. So, our request is going to look something like this:


http://api.voxeo.net/SessionControl/VoiceXML.start
      ?numbertodial=8001112222
      &tokenid=abc123...
      &voxeo-cpa-maxtime=4000
      &voxeo-cpa-maxsilence=1000
      &voxeo-cpa-runtime=20000


NOTE:  The 'voxeo-cpa-maskevent' and 'voxeo-cpa-maskstop' are filled by default with "human, machine, faxtone, sit, beep".  These values cannot be changed when using CPA on VoiceXML token-initiated applications.


When we run a CPA-enabled application with the logger open, we will see this stuff pop up, but only if we are paying attention. (Those with ADD are excused):

http://MyServer.com/CPA_test.asp
                  ?voxeo-cpa-maxtime=4000
                  &voxeo-cpa-maxsilence=1000
                  &numbertodial=8001112222
                  &voxeo-cpa-runtime=20000
                  &calltimeout=120
                  &voxeo-cpa-result=human


We can grab this value from the querystring by using whichever flavor of server-side language that you prefer. What follows is a sample asp application which will read back the CPA results to the callee, be it human, or machine.



CPA Trigger Form (HTML)


<html>
  <head>
    <title>CPA using CFM</title>
  </head>

  <body>
<br><br><br>
    <form action="http://api.voxeo.net/SessionControl/VoiceXML.start" method="get">
      <input type="hidden" name="tokenid" value="YOUR TOKENID HERE">

      <input type="hidden" name="voxeo-cpa-maxtime" value="4000">
      <input type="hidden" name="voxeo-cpa-maxsilence" value="1000">
      <input type="hidden" name="voxeo-cpa-runtime" value="20000">
      <input type="hidden" name="cpa.version" value="2.0">


    Phone Number: <input type="text" name="numbertodial"><br><br><br>
      <input type="submit" value="submit">
  </form>
  </body>
</html>



CPA Example Code (ASP)


<?xml version="1.0" encoding="UTF-8"?>

<vxml version = "2.0">
<meta name="maintainer" content="yourEmail@here.com"/>

  <var name="cpa_result" expr="'<% =Request.Querystring("voxeo-cpa-result") %>'"/>

  <form>

    <block>
      <prompt>
        <break strength="medium"/>
        Testing of the CPA has indicated that the receiving end is a <value expr="voxeo_cpa_result"/>
      </prompt>

        <log expr="'###############################'"/>
        <log expr="cpa_result"/>
        <log expr="'###############################'"/>

    </block>

  </form>
</vxml>


CPA Example Code (ColdFusion)


<?xml version="1.0"?>

<vxml version="2.0" >
<meta name="maintainer" content="yourEmail@here.com"/>

<cfheader name="Cache-Control" value= "no-cache">
<cfheader name="Expires" value="#Now()#">

<form id="F1">

<CFOUTPUT>
  <var name="CPA_result" expr="'#structfind(url,"voxeo-cpa-result")#'"/>
</CFOUTPUT>

  <block name="B1">
  <if cond="CPA_result == 'human'">
    <log expr="'*********** CPA_result = ' + CPA_result + '  ***************'"/>
    <prompt> You must be a hee yoo man</prompt>

  <else/>
    <log expr="'*********** NON-HUMAN ***************'"/>
    <prompt> You must be a mah sheen. What up widdat?</prompt>
    <exit/>
  </if>

</block>
</form>
</vxml> 




  ANNOTATIONS: EXISTING POSTS
Michael.Book
5/27/2004 4:29 PM (EDT)
Hello All,

Many users new to CPA often experience problems with long delays or silence when starting their initial VoiceXML dialog.  For all of you out there that are experiencing such delays/silence, here is a handy explanation/solution for this problem:

First off, this problem most likely has to do with long fetch times on your initial VoiceXML document itself due to DB hits and/or back-end logic.  This, of course, will in turn cause undesirable periods of silence for your end user(s).  Let me try to explain the execution order of this type of an application.  It may help to visualize what is happening here:

-- Behind the scenes CCXML front end makes the outbound call
-- Outbound call answers
-- CPA listens for audio
-- Upon CPA result begin the actual VoiceXML dialog
-- VoiceXML browser fetches initial document
-- VoiceXML FIA begins processing application and continues to do so until it reaches a valid reco field, at which point it will stop, empty its prompt queue, and wait for a filled/noinput/nomatch event

NOTE:  This means that if your application has 7 individual documents - none of which have valid reco <field>'s - that each hit a DB subsequently causing each page '2s' to load and parse, your first <audio> file/prompt would not be played for '14s'!  This is even if that first <audio> was located in the very first document.  This behavior is defined by the Form Interpretation Algorithm (FIA) as described by the W3C in the VoiceXML2.0 specification.

Now, we can plainly see how an application with this type of setup could cause massive delays and ugly periods of silence for an end user.  So, how do we fix it?

We must first understand that if we intend to use CPA, the initial delay encountered when waiting for the 'cpa-maxsilence' to expire is unavoidable anyway we choose to go about this. (I would humbly recommend that your 'cpa-maxsilence' be set to 800ms - 1200ms)  But this alone should not be a huge issue for an end user.  The real trick here will be fooling the VoiceXML FIA into stopping execution in order to immediately play an opening audio file, and then loading the "real" application document(s) - containing the DB hits and back-end logic - without causing the end user to sit in uncomfortable silence.  To accomplish this, I would recommend that you configure your application as such:

-- Your initial VoiceXML file should be a small "stub" file.  This will ensure that the initial fetch time is as quick as possible.
-- Configure your stub file to be as simple as possible with a "dummy" field.  A "dummy" reco field will cause the FIA to stop, empty its prompt queue and wait for input.
-- Place your opening audio file as the "dummy" field's audio prompt.  This could be a simple "Welcome to my companies cool app."
-- Then use a <submit> to the "real application start document" as a condition of your <noinput> and <nomatch> handlers.  (Be sure to set your timeout property to an extremely low value.  This will cause the <noinput> and <nomatch> events to fire almost instantaneously.)
-- Use the 'fetchaudio' attribute of the <submit> tag to fill the painfully long fetch time to your "real application start document" with pretty, pretty music.

And voila...  No more nasty delay/silence!


"Stub File" Example:
_________________________

      "stub.asp"
-----------------------

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.0">

<var name="voxeo-cpa-result" expr="'<%
=Request.Querystring("voxeo-cpa-result") %>'"/>

<form>
  <block>
    <log expr="'##########################################'"/>
    <log expr="'### CPA Says: ' + voxeo-cpa-result + ' ###'"/>
    <log expr="'##########################################'"/>
  </block>

  <field name="dummy">
    <property name="timeout" value="100ms"/>
<!-- set the 'noinput' timeout to 0.1 seconds -->

    <audio src="http://myserver.com/helloThere.wav">
      Hello there.  This is a call from Big Bird.
    </audio>

<!-- create a 'garbage' grammar that will NEVER get a match -->
    <grammar> [poppaoomowmow] </grammar>

    <filled>
      <prompt>
        no way this will ever happen.
      </prompt>
    </filled>

    <noinput>
      <submit next="http://myserver.com/myCoolAppWithNaughtyDBHits.asp"
              fetchaudio="myCoolWaitMusic.wav"
              method="post"
              namelist="voxeo-cpa-result" />
    </noinput>
    <nomatch>
      <submit next="http://myserver.com/myCoolAppWithNaughtyDBHits.asp"
              fetchaudio="myCoolWaitMusic.wav"
              method="post"
              namelist="voxeo-cpa-result" />
    </nomatch>

  </field>
</form>
</vxml>
_________________________


I hope this helps...


Have Fun,

~ Michael
culetti
10/1/2007 8:27 AM (EDT)
Hello, I tried your good way to decrease initial delay and I made this VXML, BUT:
if the called person speaks during the prompted text, the control jumps to the next document.
I would like the whole prompt to be spoken until the end.

<?xml version="1.0" encoding="UTF-8"?>
<vxml version = "2.1" xml:lang="it-IT">
  <var name="dbName" expr="'tt_demo'"/>
  <var name="refno" expr="''"/>
  <var name="messageType" expr="'e_demo'"/>
  <var name="message" expr="''"/>
  <var name="lang" expr="'it'"/>
<form>
    <prompt> Hi. This is a reminder message from  Tuotempo. PLease listen carefully to the following instructions.</prompt>

<field name="dummy">
    <property name="timeout" value="1ms"/>

<!-- create a 'garbage' grammar that will NEVER get a match -->
    <grammar> [poppaoomowmow] </grammar>

  <filled>
      <prompt>
        no way this will ever happen.
      </prompt>
    </filled>

    <nomatch>
      <submit next="voice_index.php"
method="post"
namelist="dbName lang refno messageType message"
              />
    </nomatch>
    <noinput>
      <submit next="voice_index.php"
method="post"
namelist="dbName lang refno messageType message"
              />
    </noinput>

</field>
 
</form>
</vxml>


Thank you
voxeojeff
10/1/2007 9:06 AM (EDT)
Hi culetti,

It sounds to me like all you need to do is disable bargein on your prompt.  To do so, simply formulate your <prompt> tag to look like this:

<prompt bargein="false"> Text goes here </prompt>

http://docs.voxeo.com/voicexml/2.0/prompt.htm

Hope this helps!

Best,
Jeff Menkel
Voxeo Corporation
culetti
6/15/2008 1:35 PM (EDT)
Hello,
answer machine recognition causes a high delay until the TTS speaks the first word (even when a human answers).
Even by playing with the latency parameters I didn't find an acceptable solution.

Any suggestions?

Emanuele Luchetti
VoxeoDustin
6/15/2008 11:33 PM (EDT)
Hey Emanuelle,

Unfortunately, there will always be a delay before the CPA result as we must wait for the expiration of maxsilence before the event will be thrown. However, instead of waiting for a CPA result, you can assume the callee is human and begin playing the dialog immediately. Simply remove the 'human' option from 'maskevent' and 'maskstop' and change your application to play the dialog as soon as the call is answered. Leave the rest of the logic intact so that when CPA determines that it is a machine instead of a human, it will restart the message accordingly. This should eliminate the delay human callees see, but still send the message properly when a machine is on the far end.

This, however, will require the use of CCXML rather than VoiceXML, as VXML will always wait for the expiration of maxsilence.

http://docs.voxeo.com/ccxml/1.0-final/ansdetection_ccxml10.htm

Let me know if you have any further questions.

Cheers,
Dustin

login

  Token Initiated Calls  |  TOC  |  ANI and DNIS  

© 2003-2008 Voxeo Corporation  |  Voxeo IVR  |  VoiceXML & CCXML IVR Developer Site