简体   繁体   English

使用psycopg2在PostgreSQL中更新多行并记录错误

[英]Upsert multiple rows in PostgreSQL with psycopg2 and errors logging

I'm writing an application that connects to database and upserts multiple rows, it creates SAVEPOINT for every row, so I can rollback without breaking a transaction, if there is a mistake, and commits every 500 rows. 我正在编写一个连接数据库并向上插入多行的应用程序,它为每一行创建SAVEPOINT,因此,如果有错误,我可以回滚而不中断事务,并且每500行提交一次。

The problem is that it works extremely slow for remote database connections (postgresql database on DigitalOcean droplet) - it took about 35 minutes to process 1000 rows, when it was only 7 second with local database (which is also not quite fast, but ok). 问题在于,它对于远程数据库连接(DigitalOcean Droplet上的Postgresql数据库)的工作非常慢-处理本地行仅花费7秒时,大约需要35分钟来处理1000行(这也不太快,但是还可以) 。

I found post about upserting using one cursor.execute(), like here , but how should I catch errors if using this trick? 我发现后如何使用一个cursor.execute()upserting,喜欢这里 ,但我应该如何确定使用此招捕获错误? Or what else should I do to make it work faster? 还是我应该做些什么来使其更快地工作? Here is my code: 这是我的代码:

self.connection = psycopg2.connect(self.connection_settings)
self.cursor = self.connection.cursor()
for record in dbf_file:
    self.cursor.execute("SAVEPOINT savepoint;")
    try:
        self.send_record(record, where_to_save=database)
        self.count += 1
        self.batch_count += 1
        if self.batch_count >= BATCH_COUNT_MAX:
            self.connection.commit()
            self.cursor.close()
            self.cursor = self.connection.cursor()
            self.batch_count = 0
    except Exception:
        self.cursor.execute("ROLLBACK TO SAVEPOINT savepoint;")
        self.save_error(traceback.format_exc())
        self.error_count += 1
        self.batch_count += 1

        if self.batch_count == BATCH_COUNT_MAX:
            self.connection.commit()
            self.cursor.close()
            self.cursor = self.connection.cursor()
            self.batch_count = 0
    else:
        if self.batch_count != 0:

Given you are already using files, I'd suggest: 鉴于您已经在使用文件,我建议:

  1. Make those files CSV (500 or what have you rows each) 将这些文件设为CSV(每行500个或每行一行)
  2. Upload them to the server (scp/ftp/rsync) 将它们上传到服务器(scp / ftp / rsync)
  3. Use psycopg's copy_from() 使用psycopg的copy_from()

This way you will likely eliminate unneeded network overhead. 这样,您可能会消除不必要的网络开销。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM