update docs with scraper deterrence information, and update old robots information

2025-11-01 09:02:24 -05:00 · 2025-04-24 13:16:28 +01:00 · 2025-04-24 13:16:28 +01:00 · 72d21082e9
commit 72d21082e9
parent 711aaada87
2 changed files with 12 additions and 3 deletions
--- a/docs/admin/robots.md
+++ b/docs/admin/robots.md
@ -10,8 +10,7 @@ You can allow or disallow crawlers from collecting stats about your instance fro

 The AI scrapers come from a [community maintained repository][airobots]. It's manually kept in sync for the time being. If you know of any missing robots, please send them a PR!

-A number of AI scrapers are known to ignore entries in `robots.txt` even if it explicitly matches their User-Agent. This means the `robots.txt` file is not a foolproof way of ensuring AI scrapers don't grab your content.
-    
-If you want to block these things fully, you'll need to block based on the User-Agent header in a reverse proxy until GoToSocial can filter requests by User-Agent header.
+A number of AI scrapers are known to ignore entries in `robots.txt` even if it explicitly matches their User-Agent. This means the `robots.txt` file is not a foolproof way of ensuring AI scrapers don't grab your content. In addition to
+this you might want to look into blocking User-Agents via [requester header filtering](request_filtering_modes.md), and enabling a proof-of-work [scraper deterrence](scraper_deterrence.md).

 [airobots]: https://github.com/ai-robots-txt/ai.robots.txt/