[chore] Use bulk updates + fewer loops in status rethreading migration (#4459)

This pull request tries to optimize our status rethreading migration by using bulk updates + avoiding unnecessary writes, and doing the migration in one top-level loop and one stragglers loop, without the extra loop to copy thread_id over.

On my machine it runs at about 2400 rows per second on Postgres, now, and about 9000 rows per second on SQLite.

Tried *many* different ways of doing this, with and without temporary indexes, with different batch and transaction sizes, etc., and this seems to be just about the most performant way of getting stuff done.

With the changes, a few minutes have been shaved off migration time testing on my development machine. *Hopefully* this will translate to more time shaved off when running on a vps with slower read/write speed and less processor power.

SQLite before:

```
real	20m58,446s
user	16m26,635s
sys	5m53,648s
```

SQLite after:

```
real	14m25,435s
user	12m47,449s
sys	2m27,898s
```

Postgres before:

```
real	28m25,307s
user	3m40,005s
sys	4m45,018s
```

Postgres after:

```
real	22m31,999s
user	3m46,674s
sys	4m39,592s
```

Reviewed-on: https://codeberg.org/superseriousbusiness/gotosocial/pulls/4459
Co-authored-by: tobi <tobi.smethurst@protonmail.com>
Co-committed-by: tobi <tobi.smethurst@protonmail.com>
This commit is contained in:
tobi 2025-10-03 12:28:55 +02:00 committed by kim
commit e7cd8bb43e
7 changed files with 429 additions and 271 deletions

View file

@ -33,6 +33,8 @@ These contribution guidelines were adapted from / inspired by those of Gitea (ht
- [Federation](#federation)
- [Updating Swagger docs](#updating-swagger-docs)
- [CI/CD configuration](#ci-cd-configuration)
- [Other Useful Stuff](#other-useful-stuff)
- [Running migrations on a Postgres DB backup locally](#running-migrations-on-a-postgres-db-backup-locally)
## Introduction
@ -525,3 +527,38 @@ The `woodpecker` pipeline files are in the `.woodpecker` directory of this repos
The Woodpecker instance for GoToSocial is [here](https://woodpecker.superseriousbusiness.org/repos/2).
Documentation for Woodpecker is [here](https://woodpecker-ci.org/docs/intro).
## Other Useful Stuff
Various bits and bobs.
### Running migrations on a Postgres DB backup locally
It may be useful when testing or debugging migrations to be able to run them against a copy of a real instance's Postgres database locally.
Basic steps for this:
First dump the Postgres database on the remote machine, and copy the dump over to your development machine.
Now create a local Postgres container and mount the dump into it with, for example:
```bash
docker run -it --name postgres --network host -e POSTGRES_PASSWORD=postgres -v /path/to/db_dump:/db_dump postgres
```
In a separate terminal window, execute a command inside the running container to load the dump into the "postgres" database:
```bash
docker exec -it --user postgres postgres psql -X -f /db_dump postgres
```
With the Postgres container still running, run GoToSocial and point it towards the container. Use the appropriate `GTS_HOST` (and `GTS_ACCOUNT_DOMAIN`) values for the instance you dumped:
```bash
GTS_HOST=example.org \
GTS_DB_TYPE=postgres \
GTS_DB_POSTGRES_CONNECTION_STRING=postgres://postgres:postgres@localhost:5432/postgres \
./gotosocial migrations run
```
When you're done messing around, don't forget to remove any containers that you started up, and remove any lingering volumes with `docker volume prune`, else you might end up filling your disk with unused temporary volumes.