简体   繁体   English

明确地说,如何让 s3fs 在 AWS 中打开 gz 文件? 胶水/S3

[英]Explicitly, how can I get s3fs to open a gz file in AWS? Glue/S3

Using AWS glue and AWS s3 and s3fs, I've come up with the following (among other attempts).使用 AWS 胶水和 AWS s3 和 s3fs,我想出了以下(以及其他尝试)。 I even see the examples at https://s3fs.readthedocs.io/en/latest/ but they're not getting me there.我什至在https://s3fs.readthedocs.io/en/latest/上看到了示例,但它们并没有让我到达那里。

 key = 'https://s3.console.aws.amazon.com/s3/buckets/datalake-sbox-landing-55555-us-weast/cloudwatch_logs/prod-coarsair/dt%267D2019-09-11/144221-3837-462d-a2e6-ba300/coarsair-coarsair-144221-3837-462d-a2e6-ba300/000000.gz' s3 = s3fs.S3FileSystem() with s3.open(key, 'r') as f: args_gz_file = f

Then it should be as easy as typing "args_gz_file" now, right?那么现在应该像输入“args_gz_file”一样简单,对吧? Nope.没有。 Where am I going wrong?!我哪里错了?!

The samples show:样本显示:

with fs.open('my-bucket/my-file.txt', 'rb') as f:

This suggests that the first parameter is BUCKET-NAME/KEY .这表明第一个参数是BUCKET-NAME/KEY

However, your code sample shows a URL ( https://s3.console.aws.amazon.com/s3/buckets/ ) instead of a bucket name.但是,您的代码示例显示 URL ( https://s3.console.aws.amazon.com/s3/buckets/ ) 而不是存储桶名称。

Perhaps try:也许尝试:

key = 'datalake-sbox-landing-55555-us-weast/cloudwatch_logs/prod-coarsair/dt%267D2019-09-11/144221-3837-462d-a2e6-ba300/coarsair-coarsair-144221-3837-462d-a2e6-ba300/000000.gz'

Side-note: It is recommended to use the official AWS SDK or the AWS CLI to access Amazon S3, rather than using s3fs .旁注:建议使用官方 AWS SDK 或 AWS CLI 访问 Amazon S3,而不是使用s3fs Amazon S3 is an object storage service, not a filesystem. Amazon S3 是 object 存储服务,而不是文件系统。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM