简体   繁体   English

扫描 10GB 的大型 Amazon DynamoDB 数据

[英]Scan large 10gb of Amazon DynamoDB data

The following code works for me but it takes 19 minutes for 1 API request to return a result.以下代码适用于我,但 1 个 API 请求需要 19 分钟才能返回结果。 An optimized result would be appreciated.一个优化的结果将不胜感激。 I would not like to go for segments because then I will have to do thread management.我不想 go 用于段,因为那样我将不得不进行线程管理。

 dynamodb = boto3.resource('dynamodb', region_name='us-west-2', endpoint_url="http://localhost:8000") table = dynamodb.Table('Movies') fe = Key('year').between(1950, 1959) pe = "#yr, title, info.rating" # Expression Attribute Names for Projection Expression only. ean = { "#yr": "year", } esk = None response = table.scan( FilterExpression=fe, ProjectionExpression=pe, ExpressionAttributeNames=ean ) for i in response['Items']: print(json.dumps(i, cls=DecimalEncoder)) // As long as LastEvaluatedKey is in response it means there are still items from the query related to the data while 'LastEvaluatedKey' in response: response = table.scan( ProjectionExpression=pe, FilterExpression=fe, ExpressionAttributeNames= ean, ExclusiveStartKey=response['LastEvaluatedKey'] ) for i in response['Items']: print(json.dumps(i, cls=DecimalEncoder))

Because it is searching across all partitions, the scan operation can be very slow.因为它正在搜索所有分区,所以scan操作可能非常慢。 You wont be able to "tune" this query like you might if you were working with a relational database.如果您正在使用关系数据库,您将无法像使用关系数据库那样“调整”此查询。

In order to best help you, I will need to know more about your access pattern (get movies by year?) and what your table currently looks like (what are your partition keys/sort keys, other attributes, etc).为了最好地帮助您,我需要更多地了解您的访问模式(按年获取电影?)以及您的表当前的外观(您的分区键/排序键,其他属性等)。

Unfortunately, scan is slow by nature.不幸的是, scan本质上很慢。 There is no way to optimize at the code level except for redesigning the table to optimize for this access pattern.除了重新设计表以针对这种访问模式进行优化之外,没有办法在代码级别进行优化。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 单个分区键值的DynamoDB最大分区大小是否为10GB? - Is there a DynamoDB max partition size of 10GB for a single partition key value? AWS PostgreSQL RDS:可释放存储空间增加了10GB以上,而未删除任何数据 - AWS PostgreSQL RDS: more than 10GB increase in Freeable Storage Space without any data removed 如何在AWS S3浏览器上解压缩10gb文件 - How to unzip 10gb file on AWS S3 browser 查询 50 GB 数据时哪些性能更​​好? 是带有条件的 MYSQL SELECT 还是带有过滤器表达式的 Dynamodb SCAN? - What is better on Performance when Querying 50 GB data ? Is it MYSQL SELECT with a condition or Dynamodb SCAN with FiLTER Expressions? 使用aws js sdk,无需使用cognito,即可将浏览器中的大文件(> 10GB)直接从浏览器安全地分段上传到s3 - Secure multi-part Upload of large files(>10GB) directly from browser to s3 using aws js sdk without using cognito 正在寻找 10GB 的免费数据存储 - 对于一个志愿组织(AWS 'always fee')模型似乎有一些'catch' 一年后不是真的免费' - looking for free data storage 10GB - for a voluntary organization (AWS 'always fee' ) model seems to have some 'catch' not really free after one year' Amazon DynamoDB数据模型 - Amazon DynamoDB Data Model 扩展EC2HardwareBuilder类以将启动卷的其他大小设置为10GB - extend EC2HardwareBuilder class to set an other size for the boot volume as 10GB 从Amazon Kinesis到DynamoDB的数据 - Data from Amazon Kinesis to DynamoDB 在 Athena 中获取 Amazon DynamoDB 数据 - Getting Amazon DynamoDB data in Athena
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM