mirror of https://github.com/superseriousbusiness/gotosocial.git synced 2025-10-28 16:22:24 -05:00

kim 6801ce299a [chore] remove nollamas middleware for now (after discussions with a security advisor) (#4433 )

i'll keep this on a separate branch for now while i experiment with other possible alternatives, but for now both our hacky implementation especially, and more popular ones (like anubis) aren't looking too great on the deterrent front: https://github.com/eternal-flame-AD/pow-buster

Co-authored-by: tobi <tobi.smethurst@protonmail.com>
Reviewed-on: https://codeberg.org/superseriousbusiness/gotosocial/pulls/4433
Co-authored-by: kim <grufwub@gmail.com>
Co-committed-by: kim <grufwub@gmail.com>

2025-09-17 14:16:53 +02:00

1.2 KiB

Raw Permalink Blame History

Robots.txt

GoToSocial serves a robots.txt file on the host domain. This file contains rules that attempt to block known AI scrapers, as well as some other indexers. It also includes some rules to ensure things like API endpoints aren't indexed by search engines since there really isn't any point to them.

Allow/disallow stats collection

You can allow or disallow crawlers from collecting stats about your instance from the /nodeinfo/2.0 and /nodeinfo/2.1 endpoints by changing the setting instance-stats-mode, which modifies the robots.txt file. See instance configuration for more details.

AI scrapers

The AI scrapers come from a community maintained repository. It's manually kept in sync for the time being. If you know of any missing robots, please send them a PR!

A number of AI scrapers are known to ignore entries in robots.txt even if it explicitly matches their User-Agent. This means the robots.txt file is not a foolproof way of ensuring AI scrapers don't grab your content. In addition to this you might want to look into blocking User-Agents via requester header filtering.

1.2 KiB Raw Permalink Blame History

Robots.txt

Allow/disallow stats collection

AI scrapers

1.2 KiB

Raw Permalink Blame History