I have a table with my basedata as a postgresql table, let this be 'basedata':
basedata:
id,name,age,height
1,john,17,185
2,nick,24,174
3,sarah,19,165
This a postgresql table with the primary key 'id'.
Now i will receive a pandas dataframe with new data or updated data on the respective person, for example:
new_data:
id,name,age,height
17,harry,26,177
23,mary,14,145
2,nick,25,174
3,sarah,19,165
The logic schould be:
new id -> insert into database
id already exists -> do nothing if every field is the same (like for sarah)
id already exists -> update differing fields
Result shoud be:
basedata:
id,name,age,height
1,john,17,185
2,nick,25,174
3,sarah,19,165
17,harry,26,177
23,mary,14,145
I am struggling how to do this with python
and psycopg2
in the best way.
I need to iterate over the DataFrame and check every datarow against the Database, or is there any more elegant way how to do this? And how to iterate the best way through the dataframe?
You can do it in SQL level instead of iterating over the DataFrame. (Can't provide an exact solution from code level since you haven't provided code snippet)
CREATE TABLE basedata (
id INTEGER PRIMARY KEY UNIQUE,
name VARCHAR NOT NULL,
age INTEGER NOT NULL,
height INTEGER NOT NULL
);
New Data
INSERT INTO basedata (id,name, age, height)
VALUES
(1, 'john', 17, 185),
(2, 'nick', 24, 174),
(3, 'sarah', 19, 165);
Update Data
INSERT INTO basedata (id, name, age, height)
VALUES
(17, 'harry', 26, 177),
(23, 'mary', 14, 145),
(2, 'nick', 25, 174),
(3, 'sarah', 19, 165)
ON CONFLICT (id)
DO NOTHING;
For more clarification:- PostgreSQL Upsert Using INSERT ON CONFLICT statement
Using sqlalchemy
, and assuming a dataframe new_data
, the flow would be as follow:
from sqlalchemy import create_engine
engine = create_engine(my_postgresql_db_uri)
con = engine.connect()
table_name = 'basedata'
base_data = pd.read_sql(table_name, con)
data = pd.append(base_data, new_data, ignore_index=True).drop_duplicates()
data.to_sql(table_name, con, if_exists='replace')
This is untested and not really optimized, as you would have to read the table everytime you want to update your table, because you can't have 'INSERT OR UPDATE' in pandas.to_sql
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.