mirror of
				https://github.com/superseriousbusiness/gotosocial.git
				synced 2025-10-31 15:42:26 -05:00 
			
		
		
		
	[chore] Update our robots.txt (#3033)
This syncs our copy with the current state of the ai.robots.txt repository. Upstream has tightened their scope to be AI-only, whereas before it included a bunch of SEO and "web intelligence" marketing stuff. I've kept those but moved them into their own section.
This commit is contained in:
		
					parent
					
						
							
								c2738474d5
							
						
					
				
			
			
				commit
				
					
						4604224c4d
					
				
			
		
					 1 changed files with 15 additions and 9 deletions
				
			
		|  | @ -34,34 +34,40 @@ const ( | |||
| User-agent: AdsBot-Google | ||||
| User-agent: Amazonbot | ||||
| User-agent: anthropic-ai | ||||
| User-agent: Applebot | ||||
| User-agent: AwarioRssBot | ||||
| User-agent: AwarioSmartBot | ||||
| User-agent: Applebot-Extended | ||||
| User-agent: Bytespider | ||||
| User-agent: CCBot | ||||
| User-agent: ChatGPT-User | ||||
| User-agent: ClaudeBot | ||||
| User-agent: Claude-Web | ||||
| User-agent: cohere-ai | ||||
| User-agent: DataForSeoBot | ||||
| User-agent: Diffbot | ||||
| User-agent: FacebookBot | ||||
| User-agent: FriendlyCrawler | ||||
| User-agent: Google-Extended | ||||
| User-agent: GoogleOther | ||||
| User-agent: GPTBot | ||||
| User-agent: ImagesiftBot | ||||
| User-agent: magpie-crawler | ||||
| User-agent: Meltwater | ||||
| User-agent: img2dataset | ||||
| User-agent: omgili | ||||
| User-agent: omgilibot | ||||
| User-agent: peer39_crawler | ||||
| User-agent: peer39_crawler/1.0 | ||||
| User-agent: PerplexityBot | ||||
| User-agent: PiplBot | ||||
| User-agent: Seekr | ||||
| User-agent: YouBot | ||||
| Disallow: / | ||||
| 
 | ||||
| # Marketing/SEO "intelligence" data scrapers | ||||
| User-agent: AwarioRssBot | ||||
| User-agent: AwarioSmartBot | ||||
| User-agent: DataForSeoBot | ||||
| User-agent: ImagesiftBot | ||||
| User-agent: magpie-crawler | ||||
| User-agent: Meltwater | ||||
| User-agent: PiplBot | ||||
| User-agent: scoop.it | ||||
| User-agent: Seekr | ||||
| Disallow: / | ||||
| 
 | ||||
| # Well-known.dev crawler. Indexes stuff under /.well-known. | ||||
| # https://well-known.dev/about/ | ||||
| User-agent: WellKnownBot      | ||||
|  |  | |||
		Loading…
	
	Add table
		Add a link
		
	
		Reference in a new issue