HTML form for searching a public web site with Google BlogSearch

Here is the simple version of the form code you need to create a search box on any public website in order to use the Google BlogSearch (beta) engine.

    <form method="get" action="">
        <input type="text" name="as_q" size="12" maxlength="125" id="q" /><br />
        <input type="submit" value="search"  />
        <input type="hidden" name="btnG" value="Search+Bblogs" id="btnG2" />
        <input type="hidden" name="hl" value="en" id="hl" />
        <input type="hidden" name="bl_url" value="" id="bl_url" />

Replace the "bl_url" value with the top-level URL for the site you want to search and that has previously described in published RSS feeds. Doesn't have to be your site - this form could be on a site at the Illinois Dept. of Revenue but used to search the IRS site (if the IRS used RSS feeds).

If you have set things up right, this form will let you have a search box on site and when people use that search box they get complete and current search results of target content.

"Set things up right" can be summarized as:

  • You need to make sure the content is described by RSS-style feeds.  Most blog engines do this for you but you can do this even if you aren't using a blog.  An page could pull content from a database, publish that content as page URLs, and also publish an RSS feed.  NPR publishes syndicates lots of content using RSS but sadly the USDA does not.
  • You need to make sure that the updates have been conveyed to Google thru one of the updating services (e.g. WebLogs).  Again, most blog engines or services do this for you but it isn't exactly rocket science to do this on your own.  The WebLog API is initially intimidating until you realize they offer a form for submissions - you can either manually use the form every once in awhile or write a little bit of code (or JSP or PHP or...) in your site that posts to that form behind the scenes.  Also, this doesn't have to be your content - it is the content you want to search.  So, while IBM publishes lots of content in RSS format , it doesn't appear they let BlogSearch know about it.  No problem! You can use a small POST to notify WebLog/Google for them and then you can use Google BlogSearch on their content.
This is the HTML way of doing things.  However, if you look at the results for a search you will see that those results can also be obtained in XML format.  That URL is dynamic - it isn't a result snapshot but when called will be the current results for that search.  This can be pretty easily woven into an page (or JSP or PHP or...).

You can get more ornate than this but this will get you started.  Play around with a basic query and see how changing the options impacts the URL in your browser to get a feel for how to code different option.

Happy coding.

- Brian

No comments: