简体   繁体   中英

Update postgresql table with data from DataFrame

I have a table with my basedata as a postgresql table, let this be 'basedata':

basedata:

id,name,age,height
1,john,17,185
2,nick,24,174
3,sarah,19,165

This a postgresql table with the primary key 'id'.

Now i will receive a pandas dataframe with new data or updated data on the respective person, for example:

new_data:

id,name,age,height
17,harry,26,177
23,mary,14,145
2,nick,25,174
3,sarah,19,165

The logic schould be:

new id -> insert into database
id already exists -> do nothing if every field is the same (like for sarah)
id already exists -> update differing fields

Result shoud be:

basedata:

id,name,age,height
1,john,17,185
2,nick,25,174
3,sarah,19,165
17,harry,26,177
23,mary,14,145

I am struggling how to do this with python and psycopg2 in the best way.

I need to iterate over the DataFrame and check every datarow against the Database, or is there any more elegant way how to do this? And how to iterate the best way through the dataframe?

You can do it in SQL level instead of iterating over the DataFrame. (Can't provide an exact solution from code level since you haven't provided code snippet)

  • Assuming table creation would be like below,
CREATE TABLE basedata (
   id INTEGER PRIMARY KEY UNIQUE,
   name VARCHAR NOT NULL,
   age INTEGER NOT NULL,
   height INTEGER NOT NULL
);

New Data

INSERT INTO basedata (id,name, age, height)
VALUES
   (1, 'john', 17, 185),
   (2, 'nick', 24, 174),
   (3, 'sarah', 19, 165);

Update Data

INSERT INTO basedata (id, name, age, height)
VALUES
   (17, 'harry', 26, 177),
   (23, 'mary', 14, 145),
   (2, 'nick', 25, 174),
   (3, 'sarah', 19, 165)
ON CONFLICT (id)
DO NOTHING;

For more clarification:- PostgreSQL Upsert Using INSERT ON CONFLICT statement

Using sqlalchemy , and assuming a dataframe new_data , the flow would be as follow:

from sqlalchemy import create_engine
engine = create_engine(my_postgresql_db_uri)
con = engine.connect()

table_name = 'basedata'

base_data = pd.read_sql(table_name, con)

data = pd.append(base_data, new_data, ignore_index=True).drop_duplicates()

data.to_sql(table_name, con, if_exists='replace')

This is untested and not really optimized, as you would have to read the table everytime you want to update your table, because you can't have 'INSERT OR UPDATE' in pandas.to_sql

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM