[英]DynamoDB Python Query with Pagination (not scan)
I am using the following code to query and paginate through a DynamoDB query: 我正在使用以下代码通过DynamoDB查询进行查询和分页:
class DecimalEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, decimal.Decimal):
return str(o)
return super(DecimalEncoder, self).default(o)
def run(date: int, start_epoch: int, end_epoch: int):
dynamodb = boto3.resource('dynamodb',
region_name='REGION',
config=Config(proxies={'https': 'PROXYIP'}))
table = dynamodb.Table('XYZ')
response = table.query(
# ProjectionExpression="#yr, title, info.genres, info.actors[0]", #THIS IS A SELECT STATEMENT
# ExpressionAttributeNames={"#yr": "year"}, #SELECT STATEMENT RENAME
KeyConditionExpression=Key('date').eq(date) & Key('uid').between(start_epoch, end_epoch)
)
for i in response[u'Items']:
print(json.dumps(i, cls=DecimalEncoder))
while 'LastEvaluatedKey' in response:
response = table.scan( ##IS THIS INEFFICIENT CODE?
# ProjectionExpression=pe,
# FilterExpression=fe,
# ExpressionAttributeNames=ean,
ExclusiveStartKey=response['LastEvaluatedKey']
)
for i in response['Items']:
print(json.dumps(i, cls=DecimalEncoder))
Although this code works, it is incredibly slow and I fear that ' response = table.scan
' is the result of this. 尽管此代码有效,但它的运行速度极其慢,我担心'
response = table.scan
'是此结果。 I am under the impression that queries are much faster than scan's (as scans require an entire iteration of the table). 我的印象是查询比扫描要快得多(因为扫描需要表的整个迭代)。 Is this code causing a complete iteration of the database table?
此代码是否导致数据库表的完整迭代?
This might be a separate question, but is there a more efficient way (with code examples) of doing this? 这可能是一个单独的问题,但是这样做有更有效的方法(带有代码示例)吗? I've attempted using Boto3's pagination but I could not get that working with queries either.
我尝试使用Boto3的分页功能,但也无法在查询中使用它。
Unfortunately, yes, a "Scan" operation reads the entire table. 不幸的是,是的,“扫描”操作会读取整个表。 You didn't say what is your table's partition key, but if it is a date, then what you are really doing here is to read a single partition, and this indeed, what a "Query" operation does much more efficiently, because it can jump directly to the required partition instead of scanning the entire table looking for it.
您没有说表的分区键是什么,但是如果它是一个日期,那么您在这里真正要做的就是读取一个分区,而这实际上是“查询”操作更有效的方法,因为它可以直接跳转到所需的分区,而不用扫描整个表来查找它。
Even with Query, you still need to do paging exactly like you did, because there's a possibility that the partition still have a lot of items. 即使使用Query,您仍然仍然需要像您一样进行分页,因为分区中仍有很多项目的可能性。 But at least you won't be scanning the entire table.
但是至少您不会扫描整个表。
By the way, scanning the entire table will cost you a lot of read operations. 顺便说一句,扫描整个表将花费很多读取操作。 You can ask AWS how many reads were accounted for you, and this can help you catch cases where you're reading too much - beyond the obvious slowness which you noticed.
您可以问一下AWS为您分配了多少次读取,这可以帮助您发现读取过多的情况-除了您注意到的明显缓慢之外。
The answer provided by Nadav Har'El was key to resolving this. Nadav Har'El提供的答案是解决此问题的关键。 I was incorrectly using DynamoDB pagination code examples by doing an initial DynamoDB query, but then using scan to paginate!
我通过执行初始DynamoDB查询来错误地使用DynamoDB分页代码示例,但是随后使用scan进行分页!
The correct way was to use query initially AND for pagination: 正确的方法是最初使用查询AND进行分页:
class DecimalEncoder(json.JSONEncoder):
def default(self, o):
if isinstance(o, decimal.Decimal):
return str(o)
return super(DecimalEncoder, self).default(o)
def run(date: int, start_epoch: int, end_epoch: int):
dynamodb = boto3.resource('dynamodb',
region_name='REGION',
config=Config(proxies={'https': 'PROXYIP'}))
table = dynamodb.Table('XYZ')
response = table.query(
KeyConditionExpression=Key('date').eq(date) & Key('uid').between(start_epoch, end_epoch)
)
for i in response[u'Items']:
print(json.dumps(i, cls=DecimalEncoder))
while 'LastEvaluatedKey' in response:
response = table.query(
KeyConditionExpression=Key('date').eq(date) & Key('uid').between(start_epoch, end_epoch),
ExclusiveStartKey=response['LastEvaluatedKey']
)
for i in response['Items']:
print(json.dumps(i, cls=DecimalEncoder))
I have still marked Nadav Har'El's response as correct as it was his answer that lead to this code example. 我仍然将Nadav Har'El的回答标记为正确,因为正是他的回答导致了此代码示例。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.