Part of: Art On Call

 

Page Actions: Edit PageRecent ChangesPage HistoryPrintable View

ArtOnCall.TextToSpeechCaching History

Hide minor edits - Show changes to markup

April 26, 2005, at 04:56 PM by 209.32.200.12 -
Changed line 9 from:

Since 1 wasn't guaranteed and we didn't want to go back to AT&T NV if we could help it, I immediately began work on a caching system for the app. The basic approach takes advantage of the fact the IVR handles broken urls gracefully - if a requested wav file doesn't exist, it just passes the text to the TTS engine and uses that. So the approach in the xml looks like this:

to:

Since 1 wasn't guaranteed and we didn't want to go back to AT&T NV if we could help it, I immediately began work on a caching system for the app. The basic approach takes advantage of the fact the IVR handles broken urls gracefully - if a requested wav file doesn't exist, it just passes the text to the TTS engine and uses that. So the the xml returned to the IVR looks like this:

Changed line 18 from:

The logic in wac_tts.php tries to be clever about caching - if we have a version of the TTS, we simply pipe the file out. If not, we return 404 not found but also spin off a request to cache the new text - so in the case where we've never seen the text before, the IVR immediately does its thing and reads the text but we also go cache the WAV file for next time (using the "what" and "id" to get the correct text, we pass this to the IVR which returns a wav file for us to cache).

to:

The logic in wac_tts.php tries to be clever about caching - if we have a version of the TTS, we simply claim to be a wav file and pipe it out. If not, we return "404 not found" but also spin off a request to cache the new text - so if we've never seen the text before, the IVR immediately does its thing and reads the text "live" but we also go cache it for next time (using the "what" and "id" to get the correct text from the webservice, we pass this to the IVR which returns a wav file for us to cache).

Changed line 20 from:

By including the MD 5? sum and content id in the url, we get one more level of speed - if the MD 5? sum doesn't match (the text is different) but we have some version of the TTS for that content, we go ahead and return that version and ask for a new one to cache. Whatever version we had is good enough to play once more, and the next request will get the right one (the script also deletes the old version after the new one is written out)

to:

By including the MD 5? sum in the url, we get one more level of speed - if the MD 5? sum doesn't match (the text is different) but we have some (older) version of the TTS for the requested content, we go ahead and return what we have and ask for the new one to cache for next time. (the script also deletes the old version after the new one is written out)

April 26, 2005, at 04:35 PM by 209.32.200.12 -
Changed line 18 from:

The logic in wac_tts.php tries to be clever about caching - if we have a version of the TTS, we simply pipe the file out. If not, we return 404 not found but also spin off a request to cache the new text - so in the case where we've never seen the text before, the IVR immediately does its thing and reads the text but we also go cache the WAV file for next time.

to:

The logic in wac_tts.php tries to be clever about caching - if we have a version of the TTS, we simply pipe the file out. If not, we return 404 not found but also spin off a request to cache the new text - so in the case where we've never seen the text before, the IVR immediately does its thing and reads the text but we also go cache the WAV file for next time (using the "what" and "id" to get the correct text, we pass this to the IVR which returns a wav file for us to cache).

April 26, 2005, at 04:34 PM by 209.32.200.12 -
Added lines 1-4:

Here's a summary of the TTS lag problems and the caching solution we eventually implemented:

Just before launching Art On Call we switched from the decent quality AT&T Natural Voices? TTS engine to the very high-quality Speechify by Scan Soft?/Speechworks. The only trouble was the TTS for event descriptions was essentially timing out - up to a minute or more before the user would hear any text read back.

Changed line 6 from:

1. Figure out why Speechify was running so slow - the Plum developer noticed it were throttling the CPU usage and hoped he could get an un-throttled version from Scan Soft?.

to:

1. Figure out why Speechify was running so slowly - the Plum developer noticed it was throttling the CPU usage and hoped he could get an un-throttled version from Scan Soft?.

Changed lines 9-24 from:

Since 1 wasn't guaranteed and we didn't want to go back to AT&T NV if we could help it, I immediately began work on a caching system for the app. (details here - it's actually pretty interesting (I think)) As it turns out, Andy (from Plum) was able to get a much faster version of Speechify installed, and the live TTS became quite speedy again. Combined with my now-working caching solution, almost all the audio is played back with little to no delay... In retrospect I should have pursued this course immediately - with the volume of text to be read, it makes much more sense to pre-cache it.

to:

Since 1 wasn't guaranteed and we didn't want to go back to AT&T NV if we could help it, I immediately began work on a caching system for the app. The basic approach takes advantage of the fact the IVR handles broken urls gracefully - if a requested wav file doesn't exist, it just passes the text to the TTS engine and uses that. So the approach in the xml looks like this:

  • <audio src="http://mediaserver/wac_tts.php?what=exhibit_info&id=1532&md5=95c5447ca437beb75ce6d97f647ba9a1">
    • Exhibition.
    • Kiki Smith.
    • February 26 through May 7th, 2006.
    • At the Walker Art Center.
  • </audio>

The logic in wac_tts.php tries to be clever about caching - if we have a version of the TTS, we simply pipe the file out. If not, we return 404 not found but also spin off a request to cache the new text - so in the case where we've never seen the text before, the IVR immediately does its thing and reads the text but we also go cache the WAV file for next time.

By including the MD 5? sum and content id in the url, we get one more level of speed - if the MD 5? sum doesn't match (the text is different) but we have some version of the TTS for that content, we go ahead and return that version and ask for a new one to cache. Whatever version we had is good enough to play once more, and the next request will get the right one (the script also deletes the old version after the new one is written out)

By running a nightly script to pre-cache all the events / jobs / etc that might be played that day, the actual number of "live" TTS calls is almost zero -- in theory only text that changes somehow during the day.

As it turns out, Andy (from Plum) was able to get a much faster version of Speechify installed, and the live TTS became quite speedy again. Combined with the now-working caching solution, almost all the audio is played back with little to no delay... In retrospect I should have pursued this course immediately - with the volume of text to be read, it makes much more sense to pre-cache it.

April 22, 2005, at 04:35 PM by 209.32.200.12 -
Changed lines 1-5 from:

Describe Text To Speech Caching here.

to:

Two solutions became apparent immediately: 1. Figure out why Speechify was running so slow - the Plum developer noticed it were throttling the CPU usage and hoped he could get an un-throttled version from Scan Soft?. 2. Pre-cache all the TTS so we'd have the .wav file standing by to play.

Since 1 wasn't guaranteed and we didn't want to go back to AT&T NV if we could help it, I immediately began work on a caching system for the app. (details here - it's actually pretty interesting (I think)) As it turns out, Andy (from Plum) was able to get a much faster version of Speechify installed, and the live TTS became quite speedy again. Combined with my now-working caching solution, almost all the audio is played back with little to no delay... In retrospect I should have pursued this course immediately - with the volume of text to be read, it makes much more sense to pre-cache it.

Page last modified on November 16, 2005, at 03:59 PM
Page Actions: Edit PageRecent ChangesPage HistoryPrintable View