简体   繁体   English

从S3存储桶链接访问公共可用数据

[英]Access publicly available data from S3 bucket link

I am trying to access the data for reproducing the Redshift benchmarks on this page. 我正在尝试访问数据以复制页面上的Redshift基准。 If you scroll down to Run This Benchmark Yourself section the author says the data can be accessed at the following S3 bucket replacing the items in [] with the format and data size that we are interested in: 如果您向下滚动至“自己运行基准测试”部分,则作者说可以在以下S3存储桶中访问数据,将[]的项目替换为我们感兴趣的格式和数据大小:

s3n://big-data-benchmark/pavlo/[text|text-deflate|sequence|sequence-snappy]/[suffix]

Based on the above, I tried downloading the data using a link this way: 基于以上内容,我尝试通过以下方式使用链接下载数据:

http://s3.amazonaws.com/big-data-benchmark/pavlo/text/tiny/

But it is not working. 但这是行不通的。 Can someone provide guidance on how to get these datasets? 有人可以提供有关如何获取这些数据集的指导吗?

If I remove the "n" from s3n:// I can list your directory: 如果我从s3n://删除“ n”, s3n://可以列出您的目录:

    $ aws s3 ls s3://big-data-benchmark/pavlo/text/tiny/
    PRE crawl/
    PRE rankings/
    PRE uservisits/
    2013-05-03 10:13:42          0 crawl_$folder$
    2013-05-09 07:23:17          0 rankings_$folder$
    2013-05-09 07:22:36          0 uservisits_$folder$

from there I can get individual paths, eg 从那里我可以获得单独的路径,例如

s3://big-data-benchmark/pavlo/text/tiny/crawl/part-00000

whose https URL would be: 其https URL为:

https://s3.amazonaws.com/big-data-benchmark/pavlo/text/tiny/crawl/part-00000 https://s3.amazonaws.com/big-data-benchmark/pavlo/text/tiny/crawl/part-00000

Good luck! 祝好运!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM