# As a condition of accessing this website, you agree to abide by the following # content signals: # (a) If a content-signal = yes, you may collect content for the corresponding # use. # (b) If a content-signal = no, you may not collect content for the # corresponding use. # (c) If the website operator does not include a content signal for a # corresponding use, the website operator neither grants nor restricts # permission via content signal with respect to the corresponding use. # The content signals and their meanings are: # search: building a search index and providing search results (e.g., returning # hyperlinks and short excerpts from your website's contents). Search does not # include providing AI-generated search summaries. # ai-input: inputting content into one or more AI models (e.g., retrieval # augmented generation, grounding, or other real-time taking of content for # generative AI search answers). # ai-train: training or fine-tuning AI models. # ANY RESTRICTIONS EXPRESSED VIA CONTENT SIGNALS ARE EXPRESS RESERVATIONS OF # RIGHTS UNDER ARTICLE 4 OF THE EUROPEAN UNION DIRECTIVE 2019/790 ON COPYRIGHT # AND RELATED RIGHTS IN THE DIGITAL SINGLE MARKET. # BEGIN Cloudflare Managed content User-Agent: * Content-signal: search=yes,ai-train=no Allow: / User-agent: Amazonbot Disallow: / User-agent: Applebot-Extended Disallow: / User-agent: Bytespider Disallow: / User-agent: CCBot Disallow: / User-agent: ClaudeBot Disallow: / User-agent: Google-Extended Disallow: / User-agent: GPTBot Disallow: / User-agent: meta-externalagent Disallow: / # END Cloudflare Managed Content # Robots policy to reduce crawl pressure while allowing good bots reasonable access Sitemap: https://www.olympiandatabase.com/sitemap/sitemap_index.xml # Default rules for all crawlers User-agent: * # Noisy or resource-intensive paths Disallow: /admin/ Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /api/ Disallow: /ajax/ Disallow: /search Disallow: /search/ Disallow: /track/ # Common heavy query patterns Disallow: /*?*debug= Disallow: /*?*utm_ Disallow: /*?*session= Disallow: /*?*sort= Disallow: /*?*page= Disallow: /*?*offset= # Gentle crawl pace for bots that honor Crawl-delay (Google ignores this) Crawl-delay: 8 # Googlebot: keep access broad; Google ignores Crawl-delay but we include for clarity User-agent: Googlebot Disallow: /admin/ Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /api/ Disallow: /ajax/ Disallow: /search Disallow: /search/ Disallow: /track/ Crawl-delay: 2 # Bingbot: allow but limit pace modestly User-agent: Bingbot Disallow: /admin/ Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /api/ Disallow: /ajax/ Disallow: /search Disallow: /search/ Disallow: /track/ Crawl-delay: 5 # DuckDuckBot: allow but limit pace modestly User-agent: DuckDuckBot Disallow: /admin/ Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /api/ Disallow: /ajax/ Disallow: /search Disallow: /search/ Disallow: /track/ Crawl-delay: 5 # Qwantbot: slow down significantly User-agent: Qwantbot Disallow: /admin/ Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /api/ Disallow: /ajax/ Disallow: /search Disallow: /search/ Disallow: /track/ Crawl-delay: 15 # PetalBot: slow down significantly User-agent: PetalBot Disallow: /admin/ Disallow: /cgi-bin/ Disallow: /tmp/ Disallow: /api/ Disallow: /ajax/ Disallow: /search Disallow: /search/ Disallow: /track/ Crawl-delay: 20 # Bytespider: block entirely (reduce pressure). Remove or relax if you prefer to allow it. User-agent: Bytespider Disallow: / # ClaudeBot: slow down and avoid query-string crawling (reduces id-scanning) User-agent: ClaudeBot Disallow: /*?* Crawl-delay: 20 # AhrefsBot: high load with low value; block User-agent: AhrefsBot Disallow: / # DotBot (Moz): slow and avoid query-string crawling User-agent: DotBot Disallow: /*?* Crawl-delay: 15 # DataForSeoBot: high-frequency SEO crawler; block User-agent: DataForSeoBot Disallow: / # ChatGPT-User: allow but be very slow; avoid query-string crawling User-agent: ChatGPT-User Disallow: /*?* Crawl-delay: 30 # Facebook crawler (used for link previews): allow User-agent: facebookexternalhit Allow: / # Meta external agent alias observed in logs: allow User-agent: meta-externalagent Allow: /