Technical - Do you keep your files under the Google file size limit?

Last updated by Seth Daily [SSW] 3 days ago.See history

A maximum file size may be enforced per crawler. Content which is after the maximum file size may be ignored. Google currently enforces a size limit of 500kb.

Regarding other files:

  • All files larger than 30MB will be completely ignored.
  • HTML, the search appliance indexes up to 2.5MB of the document, caches it, and discards the rest.
  • A non-HTML format, the search appliance:
  • Downloads the non-HTML file.
  • Converts the non-HTML file to HTML.
  • If the converted content is less than 4,000,000 bytes, indexes the first 2MB of the HTML file. (Take note that 4MB=4,194,304 bytes.) If the converted content exceeds 4,000,000 bytes, the document is not indexed. However, the document and a link to it do appear in search results.
  • Caches the first 2MB of the HTML file.
  • Discards the rest of the HTML file and the non-HTML file.

Adam Cogan
Tiago Araujo
Camilla Rosa Silva
We open source.Loving SSW Rules? Star us on GitHub. Star
Stand by... we're migrating this site to TinaCMS