mirror of
https://github.com/superseriousbusiness/gotosocial.git
synced 2025-11-10 12:27:30 -06:00
[bug] respect X-Robots-Tag and robots.txt on api/v1/instance and nodeinfo (#3756)
* feat: check X-Robots-Tag when accessing /api/v1/instance or /nodeinfo endpoints respect X-Robots-Tag * chore: go fmt ./... * Check robots.txt as well, add tests --------- Co-authored-by: tobi <tobi.smethurst@protonmail.com>
This commit is contained in:
parent
2c95fd4115
commit
d0de3ad492
20 changed files with 1404 additions and 24 deletions
115
vendor/github.com/temoto/robotstxt/README.rst
generated
vendored
Normal file
115
vendor/github.com/temoto/robotstxt/README.rst
generated
vendored
Normal file
|
|
@ -0,0 +1,115 @@
|
|||
What
|
||||
====
|
||||
|
||||
This is a robots.txt exclusion protocol implementation for Go language (golang).
|
||||
|
||||
|
||||
Build
|
||||
=====
|
||||
|
||||
To build and run tests run `go test` in source directory.
|
||||
|
||||
|
||||
Contribute
|
||||
==========
|
||||
|
||||
Warm welcome.
|
||||
|
||||
* If desired, add your name in README.rst, section Who.
|
||||
* Run `script/test && script/clean && echo ok`
|
||||
* You can ignore linter warnings, but everything else must pass.
|
||||
* Send your change as pull request or just a regular patch to current maintainer (see section Who).
|
||||
|
||||
Thank you.
|
||||
|
||||
|
||||
Usage
|
||||
=====
|
||||
|
||||
As usual, no special installation is required, just
|
||||
|
||||
import "github.com/temoto/robotstxt"
|
||||
|
||||
run `go get` and you're ready.
|
||||
|
||||
1. Parse
|
||||
^^^^^^^^
|
||||
|
||||
First of all, you need to parse robots.txt data. You can do it with
|
||||
functions `FromBytes(body []byte) (*RobotsData, error)` or same for `string`::
|
||||
|
||||
robots, err := robotstxt.FromBytes([]byte("User-agent: *\nDisallow:"))
|
||||
robots, err := robotstxt.FromString("User-agent: *\nDisallow:")
|
||||
|
||||
As of 2012-10-03, `FromBytes` is the most efficient method, everything else
|
||||
is a wrapper for this core function.
|
||||
|
||||
There are few convenient constructors for various purposes:
|
||||
|
||||
* `FromResponse(*http.Response) (*RobotsData, error)` to init robots data
|
||||
from HTTP response. It *does not* call `response.Body.Close()`::
|
||||
|
||||
robots, err := robotstxt.FromResponse(resp)
|
||||
resp.Body.Close()
|
||||
if err != nil {
|
||||
log.Println("Error parsing robots.txt:", err.Error())
|
||||
}
|
||||
|
||||
* `FromStatusAndBytes(statusCode int, body []byte) (*RobotsData, error)` or
|
||||
`FromStatusAndString` if you prefer to read bytes (string) yourself.
|
||||
Passing status code applies following logic in line with Google's interpretation
|
||||
of robots.txt files:
|
||||
|
||||
* status 2xx -> parse body with `FromBytes` and apply rules listed there.
|
||||
* status 4xx -> allow all (even 401/403, as recommended by Google).
|
||||
* other (5xx) -> disallow all, consider this a temporary unavailability.
|
||||
|
||||
2. Query
|
||||
^^^^^^^^
|
||||
|
||||
Parsing robots.txt content builds a kind of logic database, which you can
|
||||
query with `(r *RobotsData) TestAgent(url, agent string) (bool)`.
|
||||
|
||||
Explicit passing of agent is useful if you want to query for different agents. For
|
||||
single agent users there is an efficient option: `RobotsData.FindGroup(userAgent string)`
|
||||
returns a structure with `.Test(path string)` method and `.CrawlDelay time.Duration`.
|
||||
|
||||
Simple query with explicit user agent. Each call will scan all rules.
|
||||
|
||||
::
|
||||
|
||||
allow := robots.TestAgent("/", "FooBot")
|
||||
|
||||
Or query several paths against same user agent for performance.
|
||||
|
||||
::
|
||||
|
||||
group := robots.FindGroup("BarBot")
|
||||
group.Test("/")
|
||||
group.Test("/download.mp3")
|
||||
group.Test("/news/article-2012-1")
|
||||
|
||||
|
||||
Who
|
||||
===
|
||||
|
||||
Honorable contributors (in undefined order):
|
||||
|
||||
* Ilya Grigorik (igrigorik)
|
||||
* Martin Angers (PuerkitoBio)
|
||||
* Micha Gorelick (mynameisfiber)
|
||||
|
||||
Initial commit and other: Sergey Shepelev temotor@gmail.com
|
||||
|
||||
|
||||
Flair
|
||||
=====
|
||||
|
||||
.. image:: https://travis-ci.org/temoto/robotstxt.svg?branch=master
|
||||
:target: https://travis-ci.org/temoto/robotstxt
|
||||
|
||||
.. image:: https://codecov.io/gh/temoto/robotstxt/branch/master/graph/badge.svg
|
||||
:target: https://codecov.io/gh/temoto/robotstxt
|
||||
|
||||
.. image:: https://goreportcard.com/badge/github.com/temoto/robotstxt
|
||||
:target: https://goreportcard.com/report/github.com/temoto/robotstxt
|
||||
Loading…
Add table
Add a link
Reference in a new issue