[英]Scan large 10gb of Amazon DynamoDB data
The following code works for me but it takes 19 minutes for 1 API request to return a result.以下代码适用于我,但 1 个 API 请求需要 19 分钟才能返回结果。 An optimized result would be appreciated.一个优化的结果将不胜感激。 I would not like to go for segments because then I will have to do thread management.我不想 go 用于段,因为那样我将不得不进行线程管理。
dynamodb = boto3.resource('dynamodb', region_name='us-west-2', endpoint_url="http://localhost:8000") table = dynamodb.Table('Movies') fe = Key('year').between(1950, 1959) pe = "#yr, title, info.rating" # Expression Attribute Names for Projection Expression only. ean = { "#yr": "year", } esk = None response = table.scan( FilterExpression=fe, ProjectionExpression=pe, ExpressionAttributeNames=ean ) for i in response['Items']: print(json.dumps(i, cls=DecimalEncoder)) // As long as LastEvaluatedKey is in response it means there are still items from the query related to the data while 'LastEvaluatedKey' in response: response = table.scan( ProjectionExpression=pe, FilterExpression=fe, ExpressionAttributeNames= ean, ExclusiveStartKey=response['LastEvaluatedKey'] ) for i in response['Items']: print(json.dumps(i, cls=DecimalEncoder))
Because it is searching across all partitions, the scan
operation can be very slow.因为它正在搜索所有分区,所以scan
操作可能非常慢。 You wont be able to "tune" this query like you might if you were working with a relational database.如果您正在使用关系数据库,您将无法像使用关系数据库那样“调整”此查询。
In order to best help you, I will need to know more about your access pattern (get movies by year?) and what your table currently looks like (what are your partition keys/sort keys, other attributes, etc).为了最好地帮助您,我需要更多地了解您的访问模式(按年获取电影?)以及您的表当前的外观(您的分区键/排序键,其他属性等)。
Unfortunately, scan
is slow by nature.不幸的是, scan
本质上很慢。 There is no way to optimize at the code level except for redesigning the table to optimize for this access pattern.除了重新设计表以针对这种访问模式进行优化之外,没有办法在代码级别进行优化。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.