What Is a Robots.txt File?
A robots.txt file is a plain text file placed in the root directory of your website that tells search engine crawlers which pages and directories they are allowed to crawl and which they should skip. It's the first file Googlebot reads when it visits your site.
The file uses a simple syntax to define rules for specific crawlers (or all crawlers) about what they can and cannot access. When configured correctly, robots.txt improves crawl efficiency, protects sensitive pages, and helps Google focus its attention on your most important content.
Important Distinction: Robots.txt controls crawling, not indexing. A page blocked by robots.txt won't be crawled, but it can still appear in search results if other sites link to it. To prevent indexing, use a noindex meta tag instead.
How Robots.txt Works
When Googlebot (or any crawler) arrives at your domain, it first requests https://yourdomain.com/robots.txt. If the file exists, it reads the rules and decides which pages it's allowed to crawl. If no robots.txt exists, it crawls everything it can find.
Robots.txt uses the Robots Exclusion Protocol — an industry standard supported by Google, Bing, Yahoo, and virtually all legitimate crawlers. Malicious bots ignore robots.txt entirely, so it provides zero security — only crawl control for well-behaved bots.
How to Generate a Robots.txt File with ToolMatrix
Select Crawlers
Choose to apply rules to all bots (User-agent: *) or target specific crawlers like Googlebot, Bingbot, or GPTBot separately.
Add Disallow Rules
List the directories or pages you want to block. Examples: /admin/, /login, /cart, /wp-admin/.
Add Allow Rules (Optional)
Override a disallow with a more specific allow rule — useful for blocking a whole directory but allowing one specific page within it.
Add Sitemap Reference
Add your sitemap URL. This tells crawlers exactly where to find your sitemap without needing to submit it separately.
Download & Upload
Download the generated robots.txt file and upload it to your website's root directory (same level as index.html).
Robots.txt Syntax — Complete Reference
# Allow all crawlers everywhere User-agent: * Allow: / # Block specific directories User-agent: * Disallow: /admin/ Disallow: /login Disallow: /cart Disallow: /checkout # Block a whole folder but allow one page inside it User-agent: * Disallow: /private/ Allow: /private/public-page.html # Block AI training bots specifically User-agent: GPTBot Disallow: / # Add your sitemap Sitemap: https://yourdomain.com/sitemap.xml
What to Block vs What to Allow
| Page Type | Block or Allow? | Reason |
|---|---|---|
| Homepage & key pages | Allow | Core SEO pages must be crawled |
| Blog posts & product pages | Allow | Content you want ranked |
| Admin & login pages | Block | No SEO value, privacy concern |
| Cart & checkout | Block | No SEO value, dynamic content |
| Search results pages | Block | Duplicate content risk |
| Private user content | Block | Sensitive personal data |
| Print versions of pages | Block | Duplicate content |
Critical Robots.txt Mistakes to Avoid
- Blocking your entire site:
Disallow: /onUser-agent: *blocks all crawlers from everything. This is a catastrophic mistake that removes your site from Google entirely. - Blocking CSS and JavaScript: Googlebot needs to render your pages. Blocking
/assets/or/js/can prevent proper rendering and hurt rankings. - Thinking robots.txt provides security: Malicious bots ignore it. Never use robots.txt to hide sensitive data — use proper authentication instead.
- Forgetting to add your sitemap: Adding
Sitemap:directive is a simple win most sites overlook.
Generate Robots.txt Free
Visual builder with instant preview. Download ready-to-upload robots.txt in seconds.
Generate Robots.txt NowToolMatrix Robots.txt Generator
Visual interface — no need to write raw syntax. Add user agents, disallow/allow rules, and sitemap reference through a clean form. Preview the generated file before downloading. Free, no account needed, instant download.