简体   繁体   English

rclone - 如何列出 AWS S3 存储桶中具有最新文件的目录?

[英]rclone - How do I list which directory has the latest files in AWS S3 bucket?

I am currently using rclone accessing AWS S3 data, and since I don't use either one much I am not an expert.我目前正在使用 rclone 访问 AWS S3 数据,并且由于我不使用任何一个,因此我不是专家。

I am accessing the public bucket unidata-nexrad-level2-chunks and there are 1000 folders I am looking at.我正在访问公共存储桶unidata-nexrad-level2-chunks ,我正在查看 1000 个文件夹。 To see these, I am using the windows command prompt and entering :要查看这些,我正在使用 Windows 命令提示符并输入:

rclone lsf chunks:unidata-nexrad-level2-chunks/KEWX

Only one folder has realtime data being written to it at any time and that is the one I need to find.只有一个文件夹具有随时写入的实时数据,而这正是我需要找到的。 How do I determine which one is the one I need?我如何确定哪一个是我需要的? I could run a check to see which folder has the newest data.我可以运行检查以查看哪个文件夹具有最新数据。 But how can I do that?但是我该怎么做呢?

The output from my command looks like this :我的命令的输出如下所示:

1/
10/
11/
12/
13/
14/
15/
16/
17/
18/
19/
2/
20/
21/
22/
23/
... ... ... (to 1000)

What can I do to find where the latest data is being written to?我该怎么做才能找到最新数据的写入位置? Since it is only one folder at a time, I hope it would be simple.由于一次只有一个文件夹,我希望它会很简单。

Edit : I realized I need a way to list the latest file (along with it's folder #) without listing every single file and timestamp possible in all 999 directories.编辑:我意识到我需要一种方法来列出最新的文件(连同它的文件夹#),而无需列出所有 999 个目录中可能的每个文件和时间戳。 I am starting a bounty and the correct answer that allows me to do this without slogging through all of them will be awarded the bounty.我正在开始赏金,正确的答案使我能够在不费力地完成所有这些问题的情况下做到这一点,将获得赏金。 If it takes 20 minutes to list all contents from all 999 folders, it's useless as the next folder will be active by that time.如果列出所有 999 个文件夹中的所有内容需要 20 分钟,则没有用,因为到那时下一个文件夹将处于活动状态。

If you wanted to know the specific folder with the very latest file, you should write your own script that retrieves a list of ALL objects , then figures out which one is the latest and which bucket it is in. Here's a Python script that does it:如果您想知道包含最新文件的特定文件夹,您应该编写自己的脚本来检索所有对象的列表,然后找出哪个是最新的以及它在哪个存储桶中。这是一个执行此操作的 Python 脚本:

import boto3

s3_resource = boto3.resource('s3')

objects = s3_resource.Bucket('unidata-nexrad-level2-chunks').objects.filter(Prefix='KEWX/')

date_key_list = [(object.last_modified, object.key) for object in objects]

print(len(date_key_list)) # How many objects?

date_key_list.sort(reverse=True)

print(date_key_list[0][1])

Output:输出:

43727
KEWX/125/20200912-071306-065-I

It takes a while to go through those 43,700 objects!浏览这 43,700 个对象需要一段时间!

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 如何列出S3存储桶中具有特定标签的文件? - How to list files which has certain tag in S3 bucket? 如何在us-east-2中列出存储桶的AWS S3文件? - How do I list AWS S3 files for a bucket in us-east-2? 如何仅从具有 S3 url 的 s3 存储桶中删除多个选定文件? - How do i delete multiple selected Files only from a s3 bucket which has S3 url? 如何在golang的aws S3存储桶目录中列出项目? - How to list items in an aws S3 bucket directory in golang? 如何使用 AWS CLI 将名称每天更改的文件移动到 AWS S3 存储桶? - How do I move files whose name changes daily to AWS S3 bucket using AWS CLI? 如何将 AWS S3 存储桶与远程非 AWS 服务器上的文件同步? - How do I sync an AWS S3 bucket with files on a remote non-AWS server? 如何使用 AWS CLI 获取包含给定字符串的 S3 存储桶的全名? - How do I get the full name of an S3 bucket which contains a given string using AWS CLI? 如何使用 aws cli 为 S3 存储桶中的文件设置权限? - How do I use the aws cli to set permissions on files in an S3 bucket? 如何运行AWS S3存储桶中的python脚本和文件 - How do I run a python script and files located in an aws s3 bucket 如何编写将文件从 aws 客户端机器上传到 s3 存储桶的 api? - How do I write an api that uploads files from aws client machine to s3 bucket?
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM