繁体 English 中英

如何优化从 S3 读取？

[英]How can I optimize the read from S3?

原文 2022-04-28 04:41:59 0 1 amazon-s3/ aws-glue/ aws-glue-spark/ aws-glue3.0

 dyf_pagewise_word_count = glueContext.create_dynamic_frame.from_options(
 connection_type="s3",
 format="csv",
 connection_options={
     "paths": ["s3://somefile.csv/"],
     'recurse':True, 
     'groupFiles': 'inPartition', 
     'groupSize': '100000'
 },
 format_options={
     "withHeader": True,
     "separator": ","
 }
)

从 S3 读取需要 45 秒。 有什么办法可以优化阅读时间吗？

1 个解决方案

如果您使用的是 glue 3.0，则可以尝试使用optimizePerformance选项。 它批处理记录以减少 IO。有关更多详细信息，请参阅此

dyf_pagewise_word_count = glueContext.create_dynamic_frame.from_options(
 connection_type="s3",
 format="csv",
 connection_options={
     "paths": ["s3://somefile.csv/"],
     'recurse':True, 
     'groupFiles': 'inPartition', 
     'groupSize': '100000'
 },
 format_options={
     "withHeader": True,
     "separator": ",",
     "optimizePerformance": True, 
 }
)

另外，您能否将 CSV 转换为读取上游的 Parquet 之类的东西？

如何从 inte.net 访问 S3 存储桶？

[英]How can I access the S3 bucket from internet?

如何从 S3 存储桶中的 React 应用程序的 .env 文件中读取环境变量？

[英]How do I read environment variables from .env file for a React app from an S3 bucket?

如何从亚马逊的 S3 中读取和合并 Excel 张表？

[英]How to read and combine Excel sheets from Amazon's S3?

如何从 s3 将镶木地板文件读入 PCollection？

[英]How to read a parquet file into a PCollection from s3?

如何在 aws lambda 中从 aws s3 读取 csv 文件

[英]How do I read a csv file from aws s3 in aws lambda

如何从 Trino 读取 S3 中的数据分区

[英]How read data partitons in S3 from Trino

如何从S3 bucket中直接读取图片文件到memory？

[英]How to read image file from S3 bucket directly into memory?

如何从 apache beam python 读取 s3 文件？

[英]how to read s3 files from apache beam python?

如何从 Amazon S3 存储桶中读取数据并调用 AWS 服务

[英]How to read from an Amazon S3 Bucket and call AWS services

如何从 Lambda 中的 s3 (cloudtrail) 读取日志文件 function

[英]How to read log file from s3 (cloudtrail) in Lambda function

暂无

暂无

声明:本站的技术帖子网页，遵循CC BY-SA 4.0协议，如果您需要转载，请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何从 inte.net 访问 S3 存储桶？如何从 S3 存储桶中的 React 应用程序的 .env 文件中读取环境变量？如何从亚马逊的 S3 中读取和合并 Excel 张表？如何从 s3 将镶木地板文件读入 PCollection？如何在 aws lambda 中从 aws s3 读取 csv 文件如何从 Trino 读取 S3 中的数据分区如何从S3 bucket中直接读取图片文件到memory？如何从 apache beam python 读取 s3 文件？如何从 Amazon S3 存储桶中读取数据并调用 AWS 服务如何从 Lambda 中的 s3 (cloudtrail) 读取日志文件 function

相关标签

粤ICP备18138465号 © 2020-2024 STACKOOM.COM