简体   繁体   中英

PostgreSQL copy big data from table to table

I have architecture when some data some times is copied to temp table, and then on command, this data with condition must be copied into one of other tables, before this run counts, deletes and updates based on object_id which the same in all tables. The longest operation is copying - it takes to 10 minutes! on 300 000 rows. insert into t1 (t1_f1, t1_f2, name, value) SELECT DISTINCT ON (object_id) t1_f1, t1_f2, name, value where loading_process_id = 695 - it is for example.

Can I speed up the process? Or this is bad architecture and I have to change it?

Some more - heap table can contains very much data, to copy can be some millions rows. Some fields (which used for counting or filtering) indexed in heap and in other tables.

在此处输入图片说明

And this is plan for not so big data

    Insert on main_like  (cost=2993.63..3115.51 rows=6094 width=797) (actual time=6143.194..6143.194 rows=0 loops=1) 
  ->  Subquery Scan on "*SELECT*"  (cost=2993.63..3115.51 rows=6094 width=797) (actual time=55.995..125.081 rows=6094 loops=1)
        ->  Unique  (cost=2993.63..3024.10 rows=6094 width=796) (actual time=55.909..79.237 rows=6094 loops=1)
              ->  Sort  (cost=2993.63..3008.86 rows=6094 width=796) (actual time=55.904..69.195 rows=6094 loops=1)
                    Sort Key: main_loadingprocessobjects.object_id
                    Sort Method: quicksort  Memory: 3321kB
                    ->  Seq Scan on main_loadingprocessobjects  (cost=0.00..465.02 rows=6094 width=796) (actual time=0.578..8.285 rows=6094 loops=1)
                          Filter: (loading_process_id = 695)
                          Rows Removed by Filter: 1428
Planning time: 0.394 ms
Execution time: 6143.631 ms

Explain without insert -

Unique  (cost=2993.63..3024.10 rows=6094 width=796) (actual time=48.915..52.902 rows=6094 loops=1)
  ->  Sort  (cost=2993.63..3008.86 rows=6094 width=796) (actual time=48.911..49.959 rows=6094 loops=1)
        Sort Key: object_id
        Sort Method: quicksort  Memory: 3321kB
        ->  Seq Scan on main_loadingprocessobjects  (cost=0.00..465.02 rows=6094 width=796) (actual time=0.401..5.516 rows=6094 loops=1)
              Filter: (loading_process_id = 695)
              Rows Removed by Filter: 1428
Planning time: 0.214 ms
Execution time: 53.694 ms

main_loadingprocessobjects - is heap main_like - is t1

There is several point that you might concern about this issue:

  1. COPY statement in PostgreSQL is faster than insert into select statement.
  2. Create composite index on this following query ex: (type,category).

SELECT DISTINCT ON (object_id) t1_f1, t1_f2, name, value where type='ti' and category='added'

  1. GROUP BY Statement is faster than DISTINCT statement.
  2. Increase temp_buffer on postgresql.conf if you consider high usage on temp table.
  3. Try CTE (Common Table Expresions) instead temp table.

Hope this point my help you in your future development.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM