简体   繁体   中英

Copy a database through updates/inserts and deletes in postgresql.

I want to keep a database constantly updated with information that I scrape from an API. The data I get may be incomplete but I should have most of it. So far I have a try/except clause where I try inserting a row in my database and on except I update the row. The main problem is that I don't delete any rows. I want to have a copy of the server's data at any given time, or at least stay close to it. I need to somehow keep track of the rows I need to delete over time because I want to make sure it's not just the scraper that's giving me incomplete data. By the way I'm using Python and psycopg2. I'm think that this is a common problem but I can't find a better solution that creating a new database, updating it a couple times with what I currently have and then replace the databases. Any suggestions? I also don't like the fact that I expect the exception clause to get triggered often here....

Thanks in advance!

Lacking an upsert (equivalent to MySQL INSERT ... ON DUPLICATE KEY UPDATE ) has been a thorn in Postgresql's side for a long time. Generally, your approach is the best way to do it. However, there is an issue in that it's not atomic -- between the time your exception throws and you try and update, the row may have been updated by another process. Often times, this leads people to build immutable rows, but that's another topic.

It appears as of Postgres 9.5 they added an Upsert clause. INSERT ... ON CONFLICT ... DO ... ( Documentation here ).

In doing other research, it appears there's a far more comprehensive answer here: https://stackoverflow.com/a/17267423/1327710 .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM