PynamoDB 中的批量写入操作是否使用多线程策略？

Question

I'm writing entries into a DynamoDB table:我正在将条目写入 DynamoDB 表：

import time
...

for item in my_big_map.items():
    Ddb_model(column1=item[0], column2=item[1], column_timestamp=time.time()).save()

I suspect this is slow so I was thinking about using a multi-threading strategy such as concurrent.futures to write each entry to the table:我怀疑这很慢，所以我正在考虑使用多线程策略，例如concurrent.futures将每个条目写入表：

def write_one_entry(item):
    Ddb_model(column1=item[0], column2=item[1], column_timestamp=time.time()).save()

with concurrent.futures.ThreadPoolExecutor() as executor:
    executor.map(write_one_entry, my_big_map.items())

However, I've found this way of doing batch writes in PynamoDB's documentation.但是，我在 PynamoDB 的文档中发现了这种批量写入的方法。 It looks like it's a handy way to accelerate write operation.看起来这是一种加速写入操作的便捷方式。

Does it also use a multi-threading strategy?它是否也使用多线程策略？

Is the PynamoDB implementation better than using concurrent.futures to do bulk writes? PynamoDB 实现是否比使用concurrent.futures进行批量写入更好？

Answer 1

I suspect this is slow我怀疑这很慢

Correct, you're not taking advantage of the BatchWriteItem API which allows you to write up to 16 MB of data (or a max of 25 creation/delete requests).正确，您没有利用BatchWriteItem API，它允许您写入多达 16 MB 的数据（或最多 25 个创建/删除请求）。

It is essentially a bulk of PutItem and/or DeleteItem requests ( note that you cannot update an item via BatchWriteItem however ).它本质上是大量的PutItem和/或DeleteItem请求（但请注意，您不能通过BatchWriteItem更新项目）。 Not using this API means that you are losing out on performance & network improvements that come with AWS combining the update operations in one go.不使用此 API 意味着您将失去 AWS 将更新操作组合在一个 go 中所带来的性能和网络改进。

Does it also use a multi-threading strategy?它是否也使用多线程策略？

No, it doesn't need to particularly - an interface to the bulk API is all that is needed.不，它不需要特别 - 只需要一个与散装 API 的接口。

The main speed improvement will come from batch processing on AWS's side, not locally.主要的速度提升将来自 AWS 方面的批处理，而不是本地。

Is the PynamoDB implementation better than using concurrent.futures to do bulk writes? PynamoDB 实现是否比使用concurrent.futures进行批量写入更好？

Yes because it is important that the bulk API is actually used, not how the data is iterated, for the maximum benefit.是的，因为重要的是实际使用批量 API，而不是如何迭代数据，以获得最大利益。

Your concurrent.futures implementation will be faster than your original code but still doesn't take advantage of the BatchWriteItem API.您的concurrent.futures实现将比您的原始代码更快，但仍然没有利用BatchWriteItem API。 You are speeding up how you're calling AWS but you're still sending a request per item in my_big_map.items() and that is what will be taking up the most time.您正在加快调用 AWS 的速度，但您仍在为my_big_map.items()中的每个项目发送一个请求，这将占用最多的时间。

PynamoDB seems to be using the bulk API from taking a look at the source code regardless of whether you use context managers or iterators so you will be better off using the PynamoDB implementation which will also handle pagination of items etc. for you under the hood. PynamoDB 似乎正在使用大量的 API 查看源代码，无论您使用上下文管理器还是迭代器，因此您最好使用 PynamoDB 实现，它还将在后台为您处理项目分页等。

The important part is that you use the BatchWriteItem API, which will give you the speed improvement you are looking for.重要的部分是您使用BatchWriteItem API，这将为您提供所需的速度改进。

PynamoDB's batch writing will let you do this (as well as AWS's Boto3 ). PynamoDB 的批量写入将让您做到这一点（以及 AWS 的Boto3 ）。

PynamoDB 中的批量写入操作是否使用多线程策略？

问题描述

1 个解决方案

解决方案1
0 已采纳 2021-12-11 10:21:40

PynamoDB 中的批量写入操作是否使用多线程策略？

问题描述

1 个解决方案

解决方案1 0 已采纳 2021-12-11 10:21:40

解决方案1
0 已采纳 2021-12-11 10:21:40