简体   繁体   English

事务回滚

[英]Transaction roll back

I have a big list which itself is consisted of 53,000,000 smaller lists as elements.我有一个很大的列表,它本身由 53,000,000 个较小的列表作为元素组成。 And I want to submit each of these smaller lists as a row to a db in batches with the batch size of 1,000,000, meaning that every time the script connects to the db, it submits 1000,000 elements, then it disconnects from the db, and it connects again to submit another 1,000,000 rows.我想将这些较小的列表中的每一个作为一行提交到批处理大小为 1,000,000 的 db,这意味着每次脚本连接到 db 时,它都会提交 1000,000 个元素,然后它与 db 断开连接,它再次连接以提交另外 1,000,000 行。

Now my problem is that, if an error happens in the middle, for ex after submitting 50,000,000 rows, I need to delete all the rows in the db and try submitting everything from beginning.现在我的问题是,如果中间发生错误,例如在提交 50,000,000 行后,我需要删除数据库中的所有行并尝试从头开始提交所有内容。

I was thinking maybe I can use rollback(), to remove all 50,000,000 rows which has been added by now, but as long as I am using a loop, I do not know how I can rollback all 50,000,000 rows which are submitted in batches.我在想也许我可以使用 rollback() 删除现在添加的所有 50,000,000 行,但是只要我使用循环,我不知道如何回滚分批提交的所有 50,000,000 行。

does any one have a suggestion?有没有人有建议?

here is my script: "results" is the list with 53,000,000 smaller lists as elements.这是我的脚本:“结果”是包含 53,000,000 个较小列表作为元素的列表。

batch = []
counter = 0
BATCH_SIZE =1000000
cursor_count = 0

def prepare_names(names):
    return [w.replace("'", '') for w in names]

for i in range(len(results)):
    if counter < BATCH_SIZE:
        batch.append(prepare_names([results[i][0], results[i][1], results[i][2]]))  # batch => [[ACC1234.0, 'Some full taxa name'], ...]
        counter += 1
    else:
        batch.append(prepare_names([results[i][0], results[i][1], results[i][2]]))

        values = (", ").join([f"('{d[0]}', '{d[1]}', '{d[2]}')" for d in batch])
        sql = f"INSERT INTO accession_taxonomy(accession_number, taxon_id, taxonomy) VALUES {values}"

        try:
            cursor.execute(sql)
            db.commit()
        except Exception as exception:
            print(exception)
            print(f"Problem with query: {sql}")

        print(cursor.rowcount, "Records Inserted")
        cursor_count += cursor.rowcount
        counter = 0
        batch = []
else:
    if batch:
        values = (", ").join([f"('{d[0]}', '{d[1]}', '{d[2]}')" for d in batch])
        sql = f"INSERT INTO accession_taxonomy(accession_number, taxon_id, taxonomy) VALUES {values}"

        try:
            cursor.execute(sql)
            db.commit()
        except Exception as exception:
            print(exception)
            print(f"Problem with query: {sql}")

        print(cursor.rowcount, "Records Inserted")
        cursor_count += cursor.rowcount

print("Total Number Of %s Rows Has Been Added." %(cursor_count))
db.close()

There is no rollback after commit . commit后没有回滚。

concider this:考虑一下:

1st Attempt 1M rows : committed
2nd Attempt 1M rows : committed
3rd Attempt 1m rows : error

You can only rollback the 3rd attempt.您只能回滚第三次尝试。 1st and 2nd are done.第1和第2完成。

workaround modify your accession_taxonomy table and add a field something called insertHash .解决方法修改您的accession_taxonomy表并添加一个名为insertHash的字段。 Your batch update process will have an unique value for this field - for this batch exectuion.您的批处理更新过程将具有此字段的唯一值 -对于此批处理执行。 let's say todaysDate - and if any of your insert steps fails you can then do假设todaysDate - 如果您的任何插入步骤失败,您都可以执行

Delete T from accession_taxonomy T Where T.insertHash ='TheValueUSet'

so essentially it becomes like this:所以基本上它变成这样:

1st Attempt 1M rows : committed
2nd Attempt 1M rows : committed
3rd Attempt 1m rows : error
Delete AllRows where insertHash = 'TheValueUSet'

Having said that , are you sure you want to shoot 1m rows?说了这么多,你确定要拍1m行吗? have you checked if your server is capable of accepting that large packet?您是否检查过您的服务器是否能够接受那个大数据包?

I would use some flags to make sure that我会使用一些标志来确保

  • something was inserted插入了一些东西
  • nothing wrong happened没有发生任何错误

And then, use those flags to choose to commit or to rollback, such as :然后,使用这些标志来选择提交或回滚,例如:

nothing_wrong_happened = True
something_was_inserted = False

for i in range(len(results)):

    # Your code that generates the query

        try:
            cursor.execute(sql)
            something_was_inserted = True  # <-- you inserted something
        except Exception as exception:
            nothing_wrong_happened = False # <-- Something bad happened
            print(exception)
            print(f"Problem with query: {sql}")

        # the rest of your code
else:

    # Your code that generates the query

        try:
            cursor.execute(sql)
            something_was_inserted = True  # <-- you inserted something
        except Exception as exception:
            nothing_wrong_happened = False # <-- Something bad happened
            print(exception)
            print(f"Problem with query: {sql}")

        # the rest of your code

# The loop is now over
if (something_was_inserted):
    if (nothing_wrong_happened):
        db.commit()   # commit everything
    else:
        db.rollback() # rollback everything

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM