简体   繁体   English

改善PostgreSQL函数性能

[英]Improving PostgreSQL Function Performance

I have a query which effectively updates the ID of all values in a table. 我有一个查询,可以有效地更新表中所有值的ID。 This is used so that it can be combined into a table in another database, without causing collisions of the IDs. 使用它可以将其合并到另一个数据库的表中,而不会引起ID冲突。 The issue is that this can take several minutes per table. 问题是每张表可能要花费几分钟。 As there are several tables which all use this function, it can often take up to 20-30 minutes. 由于有几个表都使用此功能,因此通常可能需要20到30分钟。

This query has gone through several iterations now, and this is basically the best I can manage. 该查询现在已经经历了多次迭代,这基本上是我可以管理的最好的查询。 My SQL skills are admittedly rather limited. 我的SQL技能是相当有限的。 The function also removes any 'gaps' in the indexes as it goes, although this isn't strictly required. 该函数还删除了索引中的所有“间隙”,尽管这并不是严格要求的。

The code is as shown here: 代码如下所示:

CREATE OR REPLACE FUNCTION prep_key_ids(_table text, _offset bigint) RETURNS void AS
$BODY$
DECLARE
    old_id bigint;
    table_exists boolean;
    new_id bigint;
    min_id bigint;
    max_id bigint;
    index bigint;
    low_id bigint;
    high_id bigint;
    row_count bigint;
BEGIN
    SELECT EXISTS(SELECT 1 FROM information_schema.table_constraints WHERE table_name=_table) INTO table_exists;
    IF table_exists THEN
        EXECUTE 'SELECT MIN(id), MAX(id), COUNT(*) FROM ' || _table || ';' INTO min_id, max_id, row_count;

        IF row_count <= 0 THEN
            RETURN;
        END IF;

        IF min_id > _offset THEN
            -- minimum id greater than the start of our desired offset, we can move each id without there being a conflict
            new_id = _offset + 1;
            FOR old_id IN EXECUTE 'SELECT id FROM ' || _table || ' ORDER BY id ASC;' LOOP
                EXECUTE 'UPDATE ' || _table || ' SET id=' || new_id || ' WHERE id=' || old_id || ';';
                new_id = new_id + 1;
            END LOOP;
        ELSIF max_id <= _offset + row_count THEN
            -- maximum id is less than the end point of our desired offset, we can move the ends without there being a conflict
            new_id = _offset + row_count;
            FOR old_id IN EXECUTE 'SELECT id FROM ' || _table || ' ORDER BY id DESC;' LOOP
                EXECUTE 'UPDATE ' || _table || ' SET id=' || new_id || ' WHERE id=' || old_id || ';';
                new_id = new_id - 1;
            END LOOP;
        ELSE
            -- there exist ids before our desired start and after our desired end
            -- find the pivot point where we can set ids without there being a conflict

            EXECUTE 'WITH tb AS ( SELECT row_number() OVER (ORDER BY id ASC) - 1 AS index, id, lead(id) over(ORDER BY id ASC) AS lead_id FROM ' || _table || ' ORDER BY id ASC ) '
                    'SELECT index, id, lead_id FROM tb WHERE tb.id <= ' || _offset + 1 || ' + tb.index AND tb.lead_id >= ' || _offset + 1 || ' + tb.index + 1 LIMIT 1;'
                    INTO index, low_id, high_id;
                    -- NOTE: 'index' is index for low_id, index + 1 gives index for high_id

            -- update ids from pivot point down to start of offset
            new_id = _offset + 1 + index;
            FOR old_id IN EXECUTE 'SELECT id FROM ' || _table || ' WHERE id <= ' || low_id || ' ORDER BY id DESC;' LOOP
                EXECUTE 'UPDATE ' || _table || ' SET id=' || new_id || ' WHERE id=' || old_id || ';';
                new_id = new_id - 1;
            END LOOP;

            -- update ids from pivot point up to the end of the offset
            new_id = _offset + 1 + index + 1;
            FOR old_id IN EXECUTE 'SELECT id FROM ' || _table || ' WHERE id >= ' || high_id || ' ORDER BY id ASC;' LOOP
                EXECUTE 'UPDATE ' || _table || ' SET id=' || new_id || ' WHERE id=' || old_id || ';';
                new_id = new_id + 1;
            END LOOP;

        END IF;
    END IF;
END;
$BODY$ LANGUAGE plpgsql;

Output from running EXPLAIN (analyze, buffers, verbose) EXECUTE prep_key_ids( 'imported_fields', 1 ) is: 运行EXPLAIN (analyze, buffers, verbose) EXECUTE prep_key_ids( 'imported_fields', 1 )为:

"Result  (cost=0.00..0.26 rows=1 width=4) (actual time=337592.604..337592.605 rows=1 loops=1)"
"  Output: prep_key_ids('imported_fields'::text, '1'::bigint)"
"  Buffers: shared hit=131862084 read=4409621 dirtied=3013612 written=2828226"
"Planning time: 0.013 ms"
"Execution time: 337592.620 ms"

And the output from EXPLAIN (analyze, buffers, verbose) UPDATE imported_fields SET id=595 WHERE id=594 is: 来自EXPLAIN (analyze, buffers, verbose) UPDATE imported_fields SET id=595 WHERE id=594的输出EXPLAIN (analyze, buffers, verbose) UPDATE imported_fields SET id=595 WHERE id=594是:

"Update on public.imported_fields  (cost=0.28..8.29 rows=1 width=52) (actual time=0.115..0.115 rows=0 loops=1)"
"  Buffers: shared hit=8 read=3 dirtied=4"
"  ->  Index Scan using imported_fields_id_idx on public.imported_fields  (cost=0.28..8.29 rows=1 width=52) (actual time=0.008..0.009 rows=1 loops=1)"
"        Output: '595'::bigint, exf_import, name, import_field_type, valid_text_timestamp, ctid"
"        Index Cond: (imported_fields.id = 594)"
"        Buffers: shared hit=4"
"Planning time: 0.272 ms"
"Trigger RI_ConstraintTrigger_a_2290766 for constraint production_field_values_imported_field_fkey on imported_fields: time=0.152 calls=1"
"Trigger RI_ConstraintTrigger_a_2290771 for constraint text_field_values_imported_field_fkey on imported_fields: time=1564.663 calls=1"
"Trigger RI_ConstraintTrigger_a_2290776 for constraint field_definitions_imported_field_fkey on imported_fields: time=0.082 calls=1"
"Trigger RI_ConstraintTrigger_a_2290781 for constraint added_dependencies_domain_fkey on imported_fields: time=0.021 calls=1"
"Trigger RI_ConstraintTrigger_a_2290786 for constraint added_dependencies_criterion_fkey on imported_fields: time=0.013 calls=1"
"Trigger RI_ConstraintTrigger_a_2290791 for constraint guidance_formula_set_entries_rank_field_fkey on imported_fields: time=0.049 calls=1"
"Trigger RI_ConstraintTrigger_a_2290796 for constraint guidance_formula_set_entries_mine_area_field_fkey on imported_fields: time=0.019 calls=1"
"Trigger RI_ConstraintTrigger_a_2290806 for constraint attain_run_settings_start_date_field_fkey on imported_fields: time=0.033 calls=1"
"Trigger RI_ConstraintTrigger_a_2290811 for constraint rm_o_attain_config_datefield_fkey on imported_fields: time=0.029 calls=1"
"Trigger RI_ConstraintTrigger_a_2291411 for constraint activity_filter_operation_field_lookups_field_fkey on imported_fields: time=0.498 calls=1"
"Trigger RI_ConstraintTrigger_a_2292706 for constraint grade_distributions_confidence_field_fkey on imported_fields: time=0.020 calls=1"
"Trigger RI_ConstraintTrigger_a_2292995 for constraint saved_realization_sets_product_field_fkey on imported_fields: time=0.017 calls=1"
"Trigger RI_ConstraintTrigger_a_2293204 for constraint saved_grade_realization_sets_product_field_fkey on imported_fields: time=0.017 calls=1"
"Trigger RI_ConstraintTrigger_a_2293575 for constraint ventilation_advanced_scenarios_text_field_id_fkey on imported_fields: time=0.016 calls=1"
"Trigger RI_ConstraintTrigger_a_2294065 for constraint geosequencing_stability_settings_text_field_definition_fkey on imported_fields: time=0.015 calls=1"
"Trigger RI_ConstraintTrigger_a_2294090 for constraint geosequencing_scenario_subtask_configu_subtask_group_field_fkey on imported_fields: time=0.016 calls=1"
"Trigger RI_ConstraintTrigger_a_2294095 for constraint geosequencing_scenario_subtask_configur_subtask_type_field_fkey on imported_fields: time=0.011 calls=1"
"Trigger RI_ConstraintTrigger_a_2294120 for constraint geosequencing_scenario_subtask_filter_operati_filter_field_fkey on imported_fields: time=0.015 calls=1"
"Trigger RI_ConstraintTrigger_a_2294727 for constraint run_settings_pin_marker_field_fkey on imported_fields: time=0.053 calls=1"
"Trigger RI_ConstraintTrigger_a_2294944 for constraint formula_used_fields_field_id_fkey on imported_fields: time=0.030 calls=1"
"Trigger RI_ConstraintTrigger_a_2295066 for constraint cumulative_production_expenditures_production_field_fkey on imported_fields: time=0.028 calls=1"
"Trigger RI_ConstraintTrigger_a_2295078 for constraint run_settings_target_field_fkey on imported_fields: time=0.024 calls=1"
"Trigger RI_ConstraintTrigger_c_2290773 for constraint text_field_values_imported_field_fkey on text_field_values: time=222.517 calls=38655"
"Execution time: 1790.278 ms"

For updating this table, the biggest time drain is for the link text_field_values table, which does already have an index on the imported_field column. 要更新此表,最大的时间浪费是在链接text_field_values表上,该表已经在imported_field列上有索引。 Not sure what else to do, since there's already indexes. 由于已经有索引,因此不确定要做什么。 The text_field_values table currently has some 4 million odd rows (but there can be many many more than that). 目前,text_field_values表具有约400万个奇数行(但可以多得多)。

This is too long for a comment. 这个评论太长了。

Changing ids in a table seems quite drastic. 更改表中的ID似乎很麻烦。 If you need to distinguish the ids, why not add a fixed number or prefix with the table name: 如果需要区分ID,为什么不在表名中添加固定数字或前缀:

select 100000000 + id, . . .
from table1;

select 200000000 + id, . . .
from table2;

or: 要么:

select 'table1' || id, . . .
from table1;

select 'table2' || id, . . .
from table2;

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM