[英]Does the bulk write operation in PynamoDB utilize a multi-threading strategy?
I'm writing entries into a DynamoDB table:我正在将条目写入 DynamoDB 表:
import time
...
for item in my_big_map.items():
Ddb_model(column1=item[0], column2=item[1], column_timestamp=time.time()).save()
I suspect this is slow so I was thinking about using a multi-threading strategy such as concurrent.futures
to write each entry to the table:我怀疑这很慢,所以我正在考虑使用多线程策略,例如concurrent.futures
将每个条目写入表:
def write_one_entry(item):
Ddb_model(column1=item[0], column2=item[1], column_timestamp=time.time()).save()
with concurrent.futures.ThreadPoolExecutor() as executor:
executor.map(write_one_entry, my_big_map.items())
However, I've found this way of doing batch writes in PynamoDB's documentation.但是,我在 PynamoDB 的文档中发现了这种批量写入的方法。 It looks like it's a handy way to accelerate write operation.看起来这是一种加速写入操作的便捷方式。
Does it also use a multi-threading strategy?它是否也使用多线程策略?
Is the PynamoDB implementation better than using concurrent.futures
to do bulk writes? PynamoDB 实现是否比使用concurrent.futures
进行批量写入更好?
I suspect this is slow我怀疑这很慢
Correct, you're not taking advantage of the BatchWriteItem
API which allows you to write up to 16 MB of data (or a max of 25 creation/delete requests).正确,您没有利用BatchWriteItem
API,它允许您写入多达 16 MB 的数据(或最多 25 个创建/删除请求)。
It is essentially a bulk of PutItem
and/or DeleteItem
requests ( note that you cannot update an item via BatchWriteItem
however ).它本质上是大量的PutItem
和/或DeleteItem
请求(但请注意,您不能通过BatchWriteItem
更新项目)。 Not using this API means that you are losing out on performance & network improvements that come with AWS combining the update operations in one go.不使用此 API 意味着您将失去 AWS 将更新操作组合在一个 go 中所带来的性能和网络改进。
Does it also use a multi-threading strategy?它是否也使用多线程策略?
No, it doesn't need to particularly - an interface to the bulk API is all that is needed.不,它不需要特别 - 只需要一个与散装 API 的接口。
The main speed improvement will come from batch processing on AWS's side, not locally.主要的速度提升将来自 AWS 方面的批处理,而不是本地。
Is the PynamoDB implementation better than using
concurrent.futures
to do bulk writes? PynamoDB 实现是否比使用concurrent.futures
进行批量写入更好?
Yes because it is important that the bulk API is actually used, not how the data is iterated, for the maximum benefit.是的,因为重要的是实际使用批量 API,而不是如何迭代数据,以获得最大利益。
Your concurrent.futures
implementation will be faster than your original code but still doesn't take advantage of the BatchWriteItem
API.您的concurrent.futures
实现将比您的原始代码更快,但仍然没有利用BatchWriteItem
API。 You are speeding up how you're calling AWS but you're still sending a request per item in my_big_map.items()
and that is what will be taking up the most time.您正在加快调用 AWS 的速度,但您仍在为my_big_map.items()
中的每个项目发送一个请求,这将占用最多的时间。
PynamoDB seems to be using the bulk API from taking a look at the source code regardless of whether you use context managers or iterators so you will be better off using the PynamoDB implementation which will also handle pagination of items etc. for you under the hood. PynamoDB 似乎正在使用大量的 API 查看源代码,无论您使用上下文管理器还是迭代器,因此您最好使用 PynamoDB 实现,它还将在后台为您处理项目分页等。
The important part is that you use the BatchWriteItem
API, which will give you the speed improvement you are looking for.重要的部分是您使用BatchWriteItem
API,这将为您提供所需的速度改进。
PynamoDB's batch writing will let you do this (as well as AWS's Boto3 ). PynamoDB 的批量写入将让您做到这一点(以及 AWS 的Boto3 )。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.