From what I can see, Walker site search has had a painful and only partly successful history. The initial implementation was in Cold Fusion? on the original NT server, which was always slow and eventually would timeout or crash before returning results.
Our current searching is all handled on a per-neighborhood basis (neighborhood being a "sub site") - e.g. calendar site, press site. We're using swish-e for this: it's free, fast, and has decent perl hooks.
Only problem is our initial search modules were all written as scripts to pull content from the database and feed it to the swish-e indexer with an ID as the "title". This way we had all the control in the world, we could manipulate the results, pull up media for them as we displayed them, etc.
Not very extensible. My new tack is to build a smarter script to index our apache mod_proxy cache folders, extracting the original url and deciding whether or not to index the file. We then mess with the file a bit to remove the common nav html, and index it in swish-e normally.
Initial tests seem positive - the indexing of the flat files is pretty quick, and the results are accurate. My hope is we'll be able to build a generic search results page that will "just work"(tm) as I continue to add new indexes to the repository.
Code sample: http://newmedia.walkerart.org/example_files/cache_index.pl.txt
The only trouble will be spidering the site - we just can't handle a fast running spider.... (For that matter, can we handle deep-googling...?) So I'll have to adapt the included swish-e spider and turn it loose on our more "static" sites, and hopefully build a permanent index of them.
Eventually we'll have a set of index files that (hopefully) comprise the whole walker universe and can be searched using a single interface. Nightly cron jobs will keep the dynamic portions up to date, ideally using the flat cached files as a base.