Frequently Asked Questions
- What file types are returned in a Google search?
- What are the benefits to searching non-HTML file types?
- What are the most popular non-HTML format files on the Web?
- How do I access these non-HTML file types using Google?
- What if I don't have the application in which the file of a certain type was created?
- What about viruses spread within these type of files?
- Will Google be adding other file types in the future?
- How do I eliminate non-HTML files from my results?
- Why do some results come up as having an "Unrecognized" file type?
1. What file types are returned in a Google search?
There are 13 main file types searched by Google in addition to standard web formatted documents in HTML. The most common formats are PDF, PostScript, Microsoft Office formats:
Google is also scouring the Web for additional file types that are very rare. You may see them pop up in your results from time to time.
- Adobe Portable Document Format (pdf)
- Adobe PostScript (ps)
- Lotus 1-2-3 (wk1, wk2, wk3, wk4, wk5, wki, wks, wku)
- Lotus WordPro (lwp)
- MacWrite (mw)
- Microsoft Excel (xls)
- Microsoft PowerPoint (ppt)
- Microsoft Word (doc)
- Microsoft Works (wks, wps, wdb)
- Microsoft Write (wri)
- Rich Text Format (rtf)
- Shockwave Flash (swf)
- Text (ans, txt)
2. What are the benefits to searching non-HTML file types?
By adding these additional file formats to our search results, Google provides a wider view of the content available on the World Wide Web. In particular, there are many quality results available only in these formats, including Microsoft-published documents, investor presentations, and financial results.
3. What are the most popular non-HTML format files on the Web?
PDF formatted files are the most popular after HTML files. PostScript and Microsoft Word files are also fairly common. The other file types are relatively uncommon by comparison.
4. How do I access these non-HTML file types using Google?
Simply search for what you wish to find and these files will appear in the search results if they are relevant. The file format is usually indicated with blue text in brackets (e.g., [PDF]) in front of the pages title. The name of file format will also appear just below the title (e.g., "PDF/Adobe Portable Document Format"). This information lets you know that a viewer for the program in which the file was created is needed to display the file. If you want to search only for particular file types, you can access the Advanced Search page, which has a drop-down menu that allows you to restrict your search to the most common file types. Or, you can type "filetype:doc" as part of your search query. This will restrict your results to files ending in ".doc" (or .xls, .ppt. etc.), and show you only files created with the corresponding program.
5. What if I don't have the application in which the file of a certain type was created?
Google converts all file types it searches to either HTML or text. The search results include a link to either "View as HTML" or "View as Text". This gives you faster access to the file and removes the need for you to have the original application.
6. What about viruses spread within these type of files?
Google does not check for viruses when indexing information. You would be wise to check all such files with appropriate virus scanning software if you decide to open them in their original formats. You can avoid viruses however, by simply using the "View as HTML" option.
7. Will Google be adding other file types in the future?
Google is always a looking to expand the range of content it searches, so expect to see other file types added to the service over time as Google discovers them on the Web.
8. How do I eliminate non-HTML files from my results?
If you prefer to see a particular set of results without a particular file type included (for example, PDF), simply type -filetype:pdf within the search box along with your search term(s).
9. Why do some results come up as having an "Unrecognized" file type?
In some cases, Google's crawler comes across files that are readable, but cannot be identified as a specific type. In these cases, they will be listed as "Unrecognized" file formats. You can still view them as HTML.