Optimizing for crawl

Search Engines (SEs) perform three basic tasks to serve appropriate web pages to their user queries viz: – crawl (i.e. fetch and render) webpage, index the content (i.e. organize) found on webpage, and rank (using various algorithms) page content matching anticipated user queries.

Technical SEO starts with improving your website to make it efficient for Search Engine (SE) crawl. The easier your webmaster makes it for SEs to crawl your site, higher the likelihood of your content being processed (i.e. indexed & ranked) for organic search.

Crawl Budget

New content – web pages, images, media files, etc are added on the public internet every second. Search Engine maintain (amongst the largest) infrastructure to crawl and process (index & rank) this data. And while they are the largest, they do not have infinite resources. SEs need to prioritize what links they crawl for, and how frequently. And that is crawl budget. The amount of resources that SEs allocate to a website based on this perceived importance of website content to the SEs’ user queries is referred to as ‘crawl budget’.

Crawl Issues

Issues that would manifest in SEs being unable to efficiently crawl a website include: –

  • Sitemaps & Links are the primary ways SEs ‘discover’ link to crawl for new as well as updated content. Hidden or broken links remain undiscovered and hence un-crawled.
  • robots.txt is the industry standard to provide necessary information to SEs. Misconfigured robots.txt could potentially render swathes of your site uncrawlable.
  • Page Speed impacts the time it takes for your web host to respond to page requests. Slow responses limits exhaust the time allocated by SEs for your website.
  • Redirects such as 301s, 302s, and 410s means an extra link for SEs to parse before the eventual webpage content.

Implications for E-Commerce sites

E-Commerce sites tends to have product pages with quite similar content, and often (relatively) less content. Also same product page could have multiple URLs (e.g. based on category, tag or attribute). These risked being seen as ‘duplicate content’ by SEs.

Product Variations – where most of the product content remains same, and only an attribute (and its value) changes – are another potential risks. It is important that your e-commerce implementation partner is well versed with implications of various product taxonomy design and offers the best advise keeping SEO in mind.

We can help

Crawl is the first step in the process, crawl issues would cause decrease in keyword ranking and dropping organic traffic. Primary sources for analysing crawl issues is Google Search Console. Alternatively, there are free and paid tools offering SEO Log Analysis, which parses your web server log to uncover crawl related issues. We can help for Crawl Errors on reported in your Google Search Console or to make sense of SEO Log Analysis.

Finally, addressing crawl errors – identified and potential – are important, however that is part of the overall SEO and Technical SEO activities. As much attention should be focussed on other aspect of SEO including content optimization, keywords planning, and improving user experience.