简体   繁体   English

我如何从缺少属性的DynamoDB表中删除项目,而无论键如何?

[英]How do I delete items from a DynamoDB table wherever an attribute is missing, regardless of key?

Is it possible to delete items from a DynamoDB table without specifying partition or sort keys? 是否可以在不指定分区键或排序键的情况下从DynamoDB表中删除项目? I have numerous entries in a table with different partition and sort keys and I want to delete all the items where a certain attribute does not exist. 我在具有不同分区键和排序键的表中有许多条目,并且我想删除不存在某个属性的所有项。

AWS CLI or boto3/python solutions are welcome. 欢迎使用AWS CLI或boto3 / python解决方案。

To delete large number of items from the table you need to query or scan first and then delete the items using BatchWriteItem or DeleteItem operation. 要从表中删除大量项目,您需要先查询或扫描,然后使用BatchWriteItemDeleteItem操作删除项目。

Query and BatchWriteItem is better interms of performance and cost, so if this is a job that happens frequently, its better to add a global secondary index on the attribute you need to check for deletion. Query和BatchWriteItem是性能和成本的更好选择,因此,如果这是一项经常发生的工作,则最好在需要检查删除的属性上添加全局二级索引。 However you need to manage BatchWriteItem iteratively for large number of items since query will return paginated values. 但是,由于查询将返回分页的值,因此您需要为大量项目迭代管理BatchWriteItem

Else you can do a scan and DeleteItem iteratively. 另外,您可以迭代执行扫描和DeleteItem

Check this Stackoverflow question for more insight. 查看 Stackoverflow问题以获取更多信息。

It worth to try to use EMR Hive integration with DynamoDB. 值得尝试将EMR Hive与DynamoDB集成在一起。 It allows you to write SQL queries against a DynamoDB. 它允许您针对DynamoDB编写SQL查询。 Hive supports DELETE statement and Amazon have implemented a DynamoDB connector . Hive支持DELETE语句,Amazon已实现了DynamoDB连接器 I am not sure if this would integrate perfectly, but this worth a try. 我不确定这是否可以完美整合,但这值得一试。 Here is how to work with DynamoDB using EMR Hive. 是使用EMR Hive与DynamoDB一起工作的方法。

Another option is to use parallel scan. 另一种选择是使用并行扫描。 Just get all items from DynamoDB that match a filter expression, and delete each one of them. 只需从DynamoDB中获取与过滤器表达式匹配的所有项目,然后删除其中的每一项。 Here is how to do scans using boto client. 这是使用Boto客户端进行扫描的方法。

To speed up the process you can batch delete items using the BatchWriteItem method. 为了加快该过程,您可以使用BatchWriteItem方法批量删除项目。 Here is how to do this in boto. 是在Boto中如何执行此操作的方法。

Notice that BatchWriteItem has following limitations: 请注意,BatchWriteItem具有以下限制:

BatchWriteItem can write up to 16 MB of data, which can comprise as many as 25 put or delete requests. BatchWriteItem最多可以写入16 MB的数据,其中可以包含多达25个放置或删除请求。

Keep in mind that scans are expensive when you are doing scans you consume RCU for all items DynamoDB reads in your table and not for items it returns. 请记住,进行扫描时,扫描非常昂贵 ,因为DynamoDB在表中读取的所有项目(而不是返回的项目)都消耗RCU。 So you either need to read data slowly or provision very high RCU for a table. 因此,您要么需要缓慢读取数据,要么为表提供很高的RCU。

It's ok to do this operation infrequently, but you can't do it as a part of a web-server request if you have a table of a decent size. 可以不经常执行此操作,但是如果您有一个大小合适的表,则不能将其作为Web服务器请求的一部分。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM