Part of: Art On Call

 

Page Actions: Edit PageRecent ChangesPage HistoryPrintable View

Here's a summary of the TTS lag problems and the caching solution we eventually implemented:

Just before launching Art On Call we switched from the decent quality AT&T Natural Voices? TTS engine to the very high-quality Speechify by Scan Soft?/Speechworks. The only trouble was the TTS for event descriptions was essentially timing out - up to a minute or more before the user would hear any text read back.

Two solutions became apparent immediately: 1. Figure out why Speechify was running so slowly - the Plum developer noticed it was throttling the CPU usage and hoped he could get an un-throttled version from Scan Soft?. 2. Pre-cache all the TTS so we'd have the .wav file standing by to play.

Since 1 wasn't guaranteed and we didn't want to go back to AT&T NV if we could help it, I immediately began work on a caching system for the app. The basic approach takes advantage of the fact the IVR handles broken urls gracefully - if a requested wav file doesn't exist, it just passes the text to the TTS engine and uses that. So the the xml returned to the IVR looks like this:

The logic in wac_tts.php tries to be clever about caching - if we have a version of the TTS, we simply claim to be a wav file and pipe it out. If not, we return "404 not found" but also spin off a request to cache the new text - so if we've never seen the text before, the IVR immediately does its thing and reads the text "live" but we also go cache it for next time (using the "what" and "id" to get the correct text from the webservice, we pass this to the IVR which returns a wav file for us to cache).

By including the MD 5? sum in the url, we get one more level of speed - if the MD 5? sum doesn't match (the text is different) but we have some (older) version of the TTS for the requested content, we go ahead and return what we have and ask for the new one to cache for next time. (the script also deletes the old version after the new one is written out)

By running a nightly script to pre-cache all the events / jobs / etc that might be played that day, the actual number of "live" TTS calls is almost zero -- in theory only text that changes somehow during the day.

As it turns out, Andy (from Plum) was able to get a much faster version of Speechify installed, and the live TTS became quite speedy again. Combined with the now-working caching solution, almost all the audio is played back with little to no delay... In retrospect I should have pursued this course immediately - with the volume of text to be read, it makes much more sense to pre-cache it.

Page last modified on November 16, 2005, at 03:59 PM
Page Actions: Edit PageRecent ChangesPage HistoryPrintable View