pandas dataframe to_sql for replace and add new using sqlalchemy

Question

I m trying to update and add new data frame a pandas data frame in a SQL DB.

I have 2 queries: one is import all datas in DF (more than 100.000) and compare it with sql table with this code:

df.to_sql(table_name, con=engine, if_exists='replace', index=False)

the second one is same import and query but in just import data in a specific period to data frame and import it in the same sql table. Code used is the same:

 df.to_sql(table_name, con=engine, if_exists='replace', index=False)

My issue is: when I used my second code, it erase all existing data in sql table which is not existing in my second code (partial import).

could someone give me advice ?

for info, ma database is on Azure

thanks and happy new year

Answer 1

The if_exists='replace' is not a row wise operation. So it does not check if each row already exists and only replaces that specific row. It checks if the whole table is already there, if it finds the table, it will drop the old table and insert your new one.

Quoted from the docs :

replace: Drop the table before inserting new values.

What I think you should do is use if_exists='append' and then check for duplicate rows and remove them. That would for now be the safest approach.

The method you are looking for is being worked on atm and is called upsert , this will only insert record which do not "clash", and you can prioritise the new or old records. See GitHub ticket

pandas dataframe to_sql for replace and add new using sqlalchemy

Question

1 answers

solution1
4 ACCPTED 2020-01-01 19:45:29

pandas dataframe to_sql for replace and add new using sqlalchemy

Question

1 answers

solution1 4 ACCPTED 2020-01-01 19:45:29

solution1
4 ACCPTED 2020-01-01 19:45:29