Crawl budget explained - Kaminski.link

Crawl budget is one of the least understood — and most important — concepts in technical SEO. It's the cap on resources Googlebot will spend discovering and re-fetching your URLs. On a small site, it's invisible. On a large site, it's the difference between fresh content ranking next week and your money pages sitting in Discovered — currently not indexed for months.

Table of contents

What crawl budget actually is
How to measure it
The two levers: crawl rate limit and crawl demand
How to optimize it
Common traps
How crawl budget affects rankings

What crawl budget actually is

Googlebot has finite time and bandwidth per host. Per Google's docs, that envelope is shaped by two things: the crawl rate limit (how many parallel connections your server can handle without slowing down) and crawl demand (how badly Google wants your content right now). Together they form what the SEO industry calls crawl budget.

Whether you should care comes down to scale. If you have under ~10,000 URLs, host load is your only real concern. If you run an e-commerce catalog, news site, or large UGC platform, crawl budget is a daily problem — every wasted hit on a faceted URL is one less hit on a product page that converts.

How to measure it

Google won't hand you a number, but you can read the signals.

Google Search Console — Crawl stats report (Settings → Crawl stats). Shows total requests, average response time, and a breakdown by file type, response code, and Googlebot type. If average response time spikes, expect crawl rate to drop.
Server logs. The only ground truth. Filter for Googlebot/2.1 user agent, group by URL, and you'll see exactly which sections Google ignores. Tools: GoAccess, Screaming Frog Log Analyzer, or just awk on the access log.
Screaming Frog SEO Spider. Crawls like a bot would. Surfaces 404s, redirect chains, duplicate titles, orphan pages — every fix here recovers budget.

The two levers: crawl rate limit and crawl demand

Crawl rate limit is set by host load. Slow server, 5xx errors, or Googlebot-induced latency → Google backs off. Fast TTFB, healthy 200s → Google can push harder.

Crawl demand is set by perceived value. Strong inbound links, frequent content updates, and clean signals (canonicals, hreflang, structured data) tell Google your URLs are worth re-fetching. Stale, thin, or duplicate content signals the opposite.

Other factors that compound:

Server errors. A wave of 5xx is the fastest way to crash your crawl budget. Googlebot interprets it as host overload and cuts back, sometimes for days.
Internal linking depth. Pages buried 6+ clicks from the homepage rarely get re-crawled. Flatten the architecture for what matters.
Duplicate content. The single biggest source of waste. Session IDs in URLs, http vs https, www vs non-www, trailing slash inconsistencies — each version is a separate crawl.

How to optimize it

The job is simple: make important URLs trivial to find, make junk URLs impossible to reach.

Kill duplicates with rel="canonical" and 301s. Pick one URL per resource, redirect or canonicalize the rest. Don't rely on Google to figure it out — explicit signals win.
Use robots.txt to block infinite spaces. Faceted navigation, internal search results, sort/filter parameters, calendar pagination, login URLs — none of these belong in the crawl. Block at the directory or parameter level.
Use X-Robots-Tag: noindex, follow for pages you want crawled but not indexed (e.g. paginated archives where you still want internal links followed).
Fix 404s and redirect chains. Every 301 → 301 → 200 is wasted budget. Audit with Screaming Frog and rewrite chains to direct hops.
Keep sitemap.xml clean. Include only canonical 200-status URLs you actually want indexed. Strip noindex pages, redirected URLs, and 404s. Submit in GSC and update on every content change.
Improve TTFB. Faster responses = higher crawl rate ceiling. Cache aggressively at the edge (Cloudflare, Fastly), tune database queries, ship a real CDN.
Strengthen internal linking to your priority pages. Orphan pages — URLs with zero internal links — are nearly invisible to Googlebot. A homepage or category-page link is worth more than ten random sidebar mentions.

Common traps

crawl budget

Blocking CSS or JS in robots.txt. Google needs to render the page. Blocking critical resources hides your content and tanks rankings.
Long redirect chains. Each hop costs a crawl. Migrate everything to a single 301.
Faceted navigation creating infinite URLs. ?color=red&size=m&sort=price × N attributes = combinatorial explosion. Either canonicalize to the base category, noindex via meta robots, or block via robots.txt.
Calendar pagination loops. Event sites that let bots crawl years into the future generate millions of empty pages. Cap pagination, noindex empty months.
Soft 404s. Pages returning 200 with empty/error content. Google flags them in GSC under Index → Pages. Either fix the content or return a real 404.
Ignoring thin content. Hundreds of near-empty tag archives or auto-generated pages drain demand. Consolidate, redirect, or noindex.

How crawl budget affects rankings

Crawl budget isn't a direct ranking factor, but its second-order effects are huge:

New and updated pages get indexed faster — the lag between publish and SERP shrinks from weeks to hours.
Googlebot spends its budget on pages that actually drive traffic, raising the perceived quality of the whole site.
Critical product, money, or pillar pages stop getting silently skipped. That alone can shift organic revenue meaningfully.

Technical SEO without crawl budget hygiene is content strategy on hard mode. Fix the plumbing and content earns its rankings instead of getting lost in noise.

Got a large site with indexation issues? Get my free SEO audit — I'll show you exactly where the crawl budget leaks.

Crawl budget explained — and how to optimize it

What crawl budget actually is

How to measure it

The two levers: crawl rate limit and crawl demand

How to optimize it

Common traps

How crawl budget affects rankings

Let's talk about your project

What crawl budget actually is

How to measure it

The two levers: crawl rate limit and crawl demand

How to optimize it

Common traps

How crawl budget affects rankings

Keep reading

Dofollow vs nofollow links — what they are and when to use them

Most common SEO mistakes on business websites (and how to avoid them)

SEO for AI search — how to optimize for ChatGPT, Perplexity, Google AI Overviews

Let's talk about your project