Google search of Microsoft Office Documents

In my experience the ability for Google to index Microsoft Office documents varies by several factors.  This is just my experience on this - I've spent lots of time and experiments trying to get my content visible to search but don't really know much about the inner working of Google products.

Information in the document properties appears more visible than information buried deep in the documents.  I've noticed that information in the document properties seems more often exposed in Google searches.   Not sure how this varies between the public Google search and the intranet Google appliances. There are several articles on MSDN about how to write programs that read (or write) the document properties so I am guessing that the Google search knows how to do this.  This is also probably why employers are always pushing for using those File Properties. 

Indexing varies by document type and content type within the document.  Word documents seem to be well indexed, PowerPoint sort of, but Excel not so well and Access not at all.  However, this could be because of my usage - e.g. I rarely use Excel for text information but if one used it to store text in tables maybe that would work out better.  The information in PowerPoint that seems to make it into search results is only some of what a human would recognize as text in the slides - graphics images are obviously obscured but so is any WordArt.  I've also noticed with PowerPoint that any text visible in the outline mode seems to become part of the search but other text (free-standing text boxes) don't appear to - may be something about the PowerPoint file format.  Material that is embedded in a document - an org chart in a slide, a spreadsheet embedded into a slide, etc.  - don't seem to become visible in search results.

Macros in spreadsheets don't appear to be indexed.  I used to do a lot of "development" in Excel because the clients I worked with lived in Excel and that is what "fit their operating model" well.  Anyway, lots of business logic got tied up in those macros (for better or worse) but even though my colleagues and I would say thing like "Remember those macros we did for that lady that worked in the secondary mortgage market..." I have never had a Google search (public or intranet) been able to help me find those macros no matter how well they were commented.

The Google appliance is a wild-card on this.  I've never admin'd a Google appliance but have talked to plenty of admins for them.  Seems this thing has plenty of options that control how it searches the intranet content - beyond just when and where.  There also appear to be lots of options and upgrades for this thing as time goes by.  That makes me think, for intranet content at least, just because it couldn't find my mortgage risk index macros 3 years ago at another company doesn't mean your version of Google appliance wouldn't be able, or could be configured, to do this.

- Brian

No comments: