[英]Move all versions of a given file in a S3 bucket from one folder to another folder
I have set up an S3 bucket with versioning enabled.我设置了一个启用了版本控制的 S3 存储桶。
One external process is writing the json files, ( each json file corresponds to a single Student entity ) to the S3 bucket.一个外部进程正在将 json 文件(每个 json 文件对应于单个学生实体)写入 S3 存储桶。
I have decided the S3 bucket folder structure as follows:我决定 S3 存储桶文件夹结构如下:
s3://student-data/new/ <-- THIS WILL CONTAIN ALL THE UNPROCESSED JSON FILES
s3://student-data/processed/ <-- THIS WILL CONTAIN ALL THE PROCESSED JSON FILES.
Now, I have a Cron that runs periodically, once at every 6 hours.现在,我有一个定期运行的 Cron,每 6 小时运行一次。
New JSON files are written to new
folder by external process.新的 JSON 文件由外部进程写入new
文件夹。
I would like the Cron to process all the JSON files with associated versions in new
folder and after processing is over, move all the files with all existing versions in new
folder to processed
folder.我希望 Cron 在new
文件夹中处理所有 JSON 文件以及相关版本,处理结束后,将new
文件夹中所有现有版本的所有文件移动到已processed
文件夹。
Here I am able to fetch the current version for a json file written to new
folder and move this to processed
folder post processing.在这里,我可以获取写入new
文件夹的 json 文件的当前版本,并将其移动到已processed
的文件夹后处理。
But I am not getting an idea regarding how can I move a file with all its versions from new
to processed
so that in the future I don't have to process same version of a file twice.但是我不知道如何将所有版本的文件new
版本移动到已processed
文件,以便将来我不必处理相同版本的文件两次。
Objects in Amazon S3 cannot be 'moved'. Amazon S3 中的对象不能被“移动”。 Rather, they need to be copied to a new key , and then the original object should be deleted .相反,它们需要被复制到一个新的密钥,然后原来的 object 应该被删除。
This process would be more difficult with multiple versions of an object.对于 object 的多个版本,此过程会更加困难。 You would need to copy and delete each version individually , from oldest to newest, to create new versions in the target path.您需要单独复制和删除每个版本,从最旧到最新,以在目标路径中创建新版本。 It is not possible to process all versions of an object simultaneously.不可能同时处理 object 的所有版本。
Versioning is typically used to retain data that is overwritten.版本控制通常用于保留被覆盖的数据。 You might want to consider whether versioning is required in your situation, since it complicates the process considerably.您可能需要考虑在您的情况下是否需要版本控制,因为它会使过程相当复杂。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.