[feature] Change instance-stats-randomize to instance-stats-mode with multiple options; implement nodeinfo 2.1 (#3734)

* [feature] Change `instance-stats-randomize` to `instance-stats-mode` with multiple options; implement nodeinfo 2.1

* swaggalaggadingdong
This commit is contained in:
tobi 2025-02-04 16:52:42 +01:00 committed by GitHub
commit 07d2770995
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
18 changed files with 283 additions and 77 deletions

View file

@ -2,6 +2,10 @@
GoToSocial serves a `robots.txt` file on the host domain. This file contains rules that attempt to block known AI scrapers, as well as some other indexers. It also includes some rules to ensure things like API endpoints aren't indexed by search engines since there really isn't any point to them.
## Allow/disallow stats collection
You can allow or disallow crawlers from collecting stats about your instance from the `/nodeinfo/2.0` and `/nodeinfo/2.1` endpoints by changing the setting `instance-stats-mode`, which modifies the `robots.txt` file. See [instance configuration](../configuration/instance.md) for more details.
## AI scrapers
The AI scrapers come from a [community maintained repository][airobots]. It's manually kept in sync for the time being. If you know of any missing robots, please send them a PR!

View file

@ -77,10 +77,20 @@ definitions:
x-go-package: github.com/superseriousbusiness/gotosocial/internal/api/model
NodeInfoSoftware:
properties:
homepage:
description: Homepage for the software. Omitted in version 2.0.
example: https://docs.gotosocial.org
type: string
x-go-name: Homepage
name:
example: gotosocial
type: string
x-go-name: Name
repository:
description: Repository for the software. Omitted in version 2.0.
example: https://codeberg.org/superseriousbusiness/gotosocial
type: string
x-go-name: Repository
version:
example: 0.1.2 1234567
type: string
@ -90,6 +100,10 @@ definitions:
x-go-package: github.com/superseriousbusiness/gotosocial/internal/api/model
NodeInfoUsage:
properties:
localComments:
format: int64
type: integer
x-go-name: LocalComments
localPosts:
format: int64
type: integer
@ -101,6 +115,14 @@ definitions:
x-go-package: github.com/superseriousbusiness/gotosocial/internal/api/model
NodeInfoUsers:
properties:
activeHalfYear:
format: int64
type: integer
x-go-name: ActiveHalfYear
activeMonth:
format: int64
type: integer
x-go-name: ActiveMonth
total:
format: int64
type: integer
@ -12504,12 +12526,19 @@ paths:
summary: Returns code 200 if GoToSocial is "live", ie., able to respond to HTTP requests.
tags:
- health
/nodeinfo/2.0:
/nodeinfo/{schema_version}:
get:
description: 'See: https://nodeinfo.diaspora.software/schema.html'
operationId: nodeInfoGet
parameters:
- description: Schema version of nodeinfo to request. 2.0 and 2.1 are currently supported.
in: path
name: schema_version
required: true
type: string
produces:
- application/json; profile="http://nodeinfo.diaspora.software/ns/schema/2.0#"
- application/json; profile="http://nodeinfo.diaspora.software/ns/schema/2.1#"
responses:
"200":
description: ""

View file

@ -139,14 +139,36 @@ instance-subscriptions-process-from: "23:00"
# Default: "24h" (once per day).
instance-subscriptions-process-every: "24h"
# Bool. Set this to true to randomize stats served at
# the /api/v1|v2/instance and /nodeinfo/2.0 endpoints.
# String. Allows you to customize if and how stats are served to
# crawlers at the /api/v1|v2/instance and /nodeinfo endpoints.
#
# This can be useful when you don't want bots to obtain
# reliable information about the amount of users and
# statuses on your instance.
# Note that no matter what you set below, the /api/v1|v2/instance
# endpoints will not be allowed by robots.txt, as these are client
# API endpoints.
#
# Options: [true, false]
# Default: false
instance-stats-randomize: false
# "" / empty string (default mode): Serve accurate stats at instance
# and nodeinfo endpoints, and DISALLOW crawlers from crawling
# those endpoints in robots.txt. This mode is equivalent to politely
# asking crawlers not to crawl, but there's no guarantee they will obey,
# as unfortunately many crawlers don't even check robots.txt.
#
# "zero": Serve zeroed-out stats at instance and nodeinfo endpoints,
# and DISALLOW crawlers from crawling those endpoints in robots.txt.
# This mode prevents even ill-behaved crawlers from gathering stats
# about your instance, as all gathered values will be 0. This is the
# safest way of preserving your instance's privacy in terms of stats.
#
# "serve": Serve accurate stats at instance and nodeinfo endpoints,
# and ALLOW crawlers to crawl those endpoints. This mode is useful
# if you want to contribute to fediverse statistics collection projects.
#
# "baffle": Serve randomized, preposterous stats at instance and nodeinfo
# endpoints, and DISALLOW crawlers from crawling those endpoints in robots.txt.
# This mode can be useful to annoy crawlers that don't respect robots.txt.
# Warning that this may draw the ire of crawler implementers who don't
# respect robots.txt, and may therefore put a target on your instance.
#
# Options: ["", "zero", "serve", "baffle"]
# Default: ""
instance-stats-mode: ""
```