[英]Explicitly, how can I get s3fs to open a gz file in AWS? Glue/S3
Using AWS glue and AWS s3 and s3fs, I've come up with the following (among other attempts).使用 AWS 胶水和 AWS s3 和 s3fs,我想出了以下(以及其他尝试)。 I even see the examples at https://s3fs.readthedocs.io/en/latest/ but they're not getting me there.
我什至在https://s3fs.readthedocs.io/en/latest/上看到了示例,但它们并没有让我到达那里。
key = 'https://s3.console.aws.amazon.com/s3/buckets/datalake-sbox-landing-55555-us-weast/cloudwatch_logs/prod-coarsair/dt%267D2019-09-11/144221-3837-462d-a2e6-ba300/coarsair-coarsair-144221-3837-462d-a2e6-ba300/000000.gz' s3 = s3fs.S3FileSystem() with s3.open(key, 'r') as f: args_gz_file = f
Then it should be as easy as typing "args_gz_file" now, right?那么现在应该像输入“args_gz_file”一样简单,对吧? Nope.
没有。 Where am I going wrong?!
我哪里错了?!
The samples show:样本显示:
with fs.open('my-bucket/my-file.txt', 'rb') as f:
This suggests that the first parameter is BUCKET-NAME/KEY
.这表明第一个参数是
BUCKET-NAME/KEY
。
However, your code sample shows a URL ( https://s3.console.aws.amazon.com/s3/buckets/
) instead of a bucket name.但是,您的代码示例显示 URL (
https://s3.console.aws.amazon.com/s3/buckets/
) 而不是存储桶名称。
Perhaps try:也许尝试:
key = 'datalake-sbox-landing-55555-us-weast/cloudwatch_logs/prod-coarsair/dt%267D2019-09-11/144221-3837-462d-a2e6-ba300/coarsair-coarsair-144221-3837-462d-a2e6-ba300/000000.gz'
Side-note: It is recommended to use the official AWS SDK or the AWS CLI to access Amazon S3, rather than using s3fs
.旁注:建议使用官方 AWS SDK 或 AWS CLI 访问 Amazon S3,而不是使用
s3fs
。 Amazon S3 is an object storage service, not a filesystem. Amazon S3 是 object 存储服务,而不是文件系统。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.