What Is a Robots.txt File?
A robots.txt file tells search engine crawlers which pages or sections of your website they can and cannot access. It lives at the root of your domain (e.g., https://example.com/robots.txt) and is one of the first files crawlers check before indexing your site.
Common Robots.txt Rules
- Allow all:
User-agent: * / Allow: /— lets all crawlers access everything - Block a directory:
Disallow: /admin/— prevents crawling of the admin section - Block all:
Disallow: /— blocks all crawlers from all pages (use with caution) - Sitemap directive:
Sitemap: https://example.com/sitemap.xml— points crawlers to your sitemap
Blocking AI Crawlers
Many website owners now choose to block AI training crawlers from scraping their content. Common AI crawler user agents include GPTBot (OpenAI), CCBot (Common Crawl), and Google-Extended (Google AI training). Use the checkbox above to add these rules automatically.
Best Practices
- Always include a Sitemap directive — it helps crawlers discover your content faster
- Don't use robots.txt to hide sensitive pages — use authentication or
noindexmeta tags instead - Test your robots.txt using Google Search Console's robots.txt tester
- Keep it simple — complex rules are easy to get wrong