← Back to Index

Robots.txt

FigureThe decision flow of a bot visiting your site. It ALWAYS checks robots.txt first.

What is Robots.txt?

Located at yourdomain.com/robots.txt, this simple text file is the first thing a bot looks for when visiting your site. It uses the Robots Exclusion Protocol to manage crawl traffic. It is not a security mechanism; it is a request for behavior.

Why it Matters for SEO

1. Optimizing Crawl Budget

If you have a 10,000-page site but only 500 pages are valuable, you don't want Google wasting resources crawling your internal search results, admin logins, or temporary files. Robots.txt keeps bots focused on your "money pages."

2. Preventing Server Overload

Aggressive bots can slow down your server. Blocking bad bots or limiting crawl rates can keep your site fast for real humans.

Code Implementation

Standard Allow All:

text
User-agent: *
Allow: /
Sitemap: https://example.com/sitemap.xml

Block Specific Areas:

text
User-agent: *
Disallow: /admin/
Disallow: /checkout/
Disallow: /search?q=

Block Bad Bots:

text
User-agent: GPTBot
Disallow: /

Common Pitfalls & How to Fix

The "Disallow All" Disaster

The Mistake: Leaving Disallow: / in your file after moving from staging to production.

The Fix: This de-indexes your entire website. Always double-check this file immediately after launching a new site.

Blocking Resources (CSS/JS)

The Mistake: Disallow: /assets/ or Disallow: /scripts/.

The Fix: Google renders pages like a modern browser. If you block CSS or JS, Google sees a broken, ugly page and may assume it's not mobile-friendly, hurting your rankings.

Confusing Noindex with Disallow

The Mistake: Trying to remove a page from Google by blocking it in robots.txt.

The Fix: If a page is blocked in robots.txt, Google cannot read the meta noindex tag on that page. To de-index a page, you must allow crawling but add a <meta name="robots" content="noindex"> tag to the HTML.

How to Audit with Mygom

Mygom verifies:

  1. The file exists at the root (/robots.txt).
  2. It is accessible (returns HTTP 200).
  3. It does not inadvertently block the entire site.
  4. It contains a link to your Sitemap.