I have a table matches
and table2
such that matches.key
and table2.key
has many-to-many relationship.
matches
-------
key (bigint), other columns...
---
1
2
1
table2
-------
key (bigint), createdAt (date), other columns...
---
1
2
2
1
I want to delete all "orphan" records in table2
which has a key
which does not exist in matches
AND these records created before 5
hours ago.
explain (analyse,buffers) delete from table2 as mo
where not exists (select null from matches pf where pf.key=mo.key)
and mo."createdAt" < now() - interval '5 hours'
I'm running the delete query every 5 seconds. I can change it if it will worth it.
It's working but it's slow (600k records in table2
and 1k records in matches
):
[
{
"QUERY PLAN": "Delete on table2 mo (cost=127.40..33648.30 rows=1 width=12) (actual time=248.302..248.305 rows=0 loops=1)"
},
{
"QUERY PLAN": " Buffers: shared hit=9435 read=11203"
},
{
"QUERY PLAN": " I/O Timings: read=23.365"
},
{
"QUERY PLAN": " -> Hash Anti Join (cost=127.40..33648.30 rows=1 width=12) (actual time=248.300..248.302 rows=0 loops=1)"
},
{
"QUERY PLAN": " Hash Cond: (mo.\"key\" = pf.\"key\")"
},
{
"QUERY PLAN": " Buffers: shared hit=9435 read=11203"
},
{
"QUERY PLAN": " I/O Timings: read=23.365"
},
{
"QUERY PLAN": " -> Seq Scan on table2 mo (cost=0.00..30930.79 rows=296013 width=14) (actual time=0.037..196.845 rows=296970 loops=1)"
},
{
"QUERY PLAN": " Filter: (\"createdAt\" < (now() - '05:00:00'::interval))"
},
{
"QUERY PLAN": " Rows Removed by Filter: 297302"
},
{
"QUERY PLAN": " Buffers: shared hit=9318 read=11203"
},
{
"QUERY PLAN": " I/O Timings: read=23.365"
},
{
"QUERY PLAN": " -> Hash (cost=121.62..121.62 rows=462 width=14) (actual time=0.461..0.462 rows=458 loops=1)"
},
{
"QUERY PLAN": " Buckets: 1024 Batches: 1 Memory Usage: 30kB"
},
{
"QUERY PLAN": " Buffers: shared hit=117"
},
{
"QUERY PLAN": " -> Seq Scan on matches pf (cost=0.00..121.62 rows=462 width=14) (actual time=0.046..0.343 rows=458 loops=1)"
},
{
"QUERY PLAN": " Buffers: shared hit=117"
},
{
"QUERY PLAN": "Planning:"
},
{
"QUERY PLAN": " Buffers: shared hit=10 read=2"
},
{
"QUERY PLAN": " I/O Timings: read=0.044"
},
{
"QUERY PLAN": "Planning Time: 0.702 ms"
},
{
"QUERY PLAN": "Execution Time: 248.396 ms"
}
]
matches
table - will be filled up to 1k records in a life-time. table2
table - will be filled up to 20 million records on Saturday (throught out all the day). In all other days, the table will be filled at most by 2 million new recods.To messure the performance of my query, I created a small script that inserts "old" and "new" records. "old" records are expected to be deleted after every run. "new" records are expected to stay.
The amount of "old" and "new" records each inserted in a second is 1k (sum=2k).
I expect the duration of the query to increase as long as there are more "new" records in table2
, but the initial duraiton is slow and the increase rate is too high:
promethues:
table2
): 0.06 seconds.table2
): 5-7+ seconds and it's increasing...table2
table - multi-column index on (this order): key
, createdAt
. table2
table - key
matches
table - one of the single indexes: key
key
is bigint createdAt
is timestamp with time zone
13.2
What can I do to improve the initial query duration and decrease the increase-rate?
First, something seems suspicious with the design. . .
matches
table is suspicious. That suggests that there might be a better way to solve your overall problem -- but you don't explain what you are doing. For instance, you might want a trigger to do deletes.
You only explain the query that you have.
In any case, one index that might help is an index on table2(createdAt)
. You seem to have a pretty high insert volume, if you need to run this every 5 seconds. That suggests that load on the server might also be an issue.
If most rows are protected from deletion by being too new, then you can quickly rule those rows out by an index, only scanning the rows older than 5 hours. But if most rows are protected by having matches, there is no way to quickly rule those out with your current design. Each protected row (older than 5 hours if an index on that is used) will need to be visited and assessed every 5 seconds.
Assuming you want to use this general design at all, perhaps you could partition the data. You could have one partition for vulnerable rows (with no matches) and another for matched rows. Then you could have a trigger that moves rows to the vulnerable partition upon deletions of rows from the matches table (if there are no remaining matches)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.