简体   繁体   English

使用 psycopg2 批量插入多行

[英]Batch upsert multiple rows with psycopg2

I need to upsert ( INSERT... ON CONFLICT DO UPDATE ) multiple rows at once into a postgreSQL database using psycopg2.我需要使用 psycopg2 一次将多行插入( INSERT... ON CONFLICT DO UPDATE )到 postgreSQL 数据库中。 Essentially, I have a list of tuples representing "rows", and I need to insert them into the database, or update the database if there is a conflict.本质上,我有一个表示“行”的元组列表,我需要将它们插入数据库,或者如果存在冲突则更新数据库。 I need (possibly) every column to be updated (if not inserted), along with every row.我需要(可能)更新每一列(如果没有插入),以及每一行。

I've tried two main approaches, using psycopg2's cursor.execute() function and execute_many() function.我尝试了两种主要方法,使用 psycopg2 的cursor.execute() function 和execute_many() function。 First, I did the following:首先,我做了以下事情:

upsert_statement = 'INSERT INTO table (col1, col2, col3) VALUES %s ON CONFLICT (col1) DO UPDATE SET (col1, col2, col3) = ROW (excluded.*) WHERE table IS DISTINCT FROM excluded'

psycopg2.extras.execute_values(cursor, upsert_statement, values)

I create an SQL statement that inserts the values using execute_many() (where values passed to it is a list of tuples), and on a conflict the column values should be updated to excluded.我创建了一个 SQL 语句,该语句使用execute_many()插入值(其中传递给它的values是元组列表),并且在发生冲突时,应将列值更新为排除。 However, I get the error SyntaxError: number of columns does not match number of values sometimes , even though I know for a fact that the number of columns and values are the same.但是,我收到错误SyntaxError: number of columns does not match number of values 即使我知道列数和值是相同的事实

So, I tried using only execute() :所以,我尝试只使用execute()

upsert_statement = f'INSERT INTO table (col1, col2, col3) VALUES (value1, value2, value3), (value4, value5, value6)... ON CONFLICT (col1) DO UPDATE SET (col1, col2, col3) = (value1, value2, value3), (value4, value5, value6)...'

cursor.execute(upsert_statement)

Here, I do the batch upsert as part of the SQL, and so don't have to use execute_values() .在这里,我将批量更新插入作为 SQL 的一部分,因此不必使用execute_values() However, I get a SyntaxError after the DO UPDATE SET , because I don't think it's valid to have (col1, col2, col3) = (value1, value2, value3), (value4, value5, value6)... .但是,在DO UPDATE SET之后出现SyntaxError ,因为我认为(col1, col2, col3) = (value1, value2, value3), (value4, value5, value6)...

What am I doing wrong?我究竟做错了什么? How can I bulk upsert multiple rows using psycopg2?如何使用 psycopg2 批量更新多行?

(I should note that in reality, (col1, col2, col3) and (value1, value2, value3) are dynamic, and change frequently) (我应该注意,实际上, (col1, col2, col3)(value1, value2, value3)是动态的,并且经常变化)

You need to use table EXCLUDED instead of value literals in your ON CONFLICT statement.您需要在ON CONFLICT语句中使用EXCLUDED而不是值文字。 It's a special table holding values proposed for insert.这是一个特殊的表格,保存建议插入的值。 You also don't need to re-set the conflicting values, only the rest.您也不需要重新设置冲突值,只需 rest。

INSERT INTO table (col1, col2, col3) 
VALUES 
    (value1, value2, value3), 
    (value4, value5, value6)
ON CONFLICT (col1) DO UPDATE 
SET (col2, col3) = (EXCLUDED.col2, EXCLUDED.col3);

For readability, you can format your in-line SQLs if you triple-quote your f-strings.为了便于阅读,如果您对 f 字符串进行三重引用,您可以格式化您的内联 SQL。 I'm not sure if and which IDEs can detect it's an in-line SQL in Python and switch syntax highlighting, but I find indentation helpful enough.我不确定是否以及哪些 IDE 可以检测到它是 Python 中的内联 SQL 并切换语法突出显示,但我发现缩进足够有用。

upsert_statement = f"""
    INSERT INTO table (col1, col2, col3) 
    VALUES 
        ({value1}, {value2}, {value3}), 
        ({value4}, {value5}, {value6})
    ON CONFLICT (col1) DO UPDATE 
    SET (col2, col3) = (EXCLUDED.col2, EXCLUDED.col3)"""

Here's a simple test:这是一个简单的测试:

drop table if exists test_70066823 cascade;
create table test_70066823 (
    id integer primary key, 
    text_column_1 text, 
    text_column_2 text);
insert into test_70066823 select 1,'first','first';
insert into test_70066823 select 2,'second','second';
select * from test_70066823;
-- id | text_column_1 | text_column_2
------+---------------+---------------
--  1 | first         | first
--  2 | second        | second
--(2 rows)


insert into test_70066823
values
        (1, 'third','first'),
        (3, 'fourth','third'),
        (4, 'fifth','fourth'),
        (2, 'sixth','second')
on conflict (id) do update 
set text_column_1=EXCLUDED.text_column_1,
    text_column_2=EXCLUDED.text_column_2;

select * from test_70066823;
-- id | text_column_1 | text_column_2
------+---------------+---------------
--  1 | third         | first
--  3 | fourth        | third
--  4 | fifth         | fourth
--  2 | sixth         | second
--(4 rows)

You can refer to this for improved insert performance.您可以参考内容以提高插入性能。 Inserts with a simple string-based execute or execute_many are the top 2 slowest approaches mentioned there.具有简单的基于字符串的executeexecute_many的插入是其中提到的最慢的 2 种方法。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM