简体   繁体   English

S3中两个版本文件的区别

[英]Difference between two version of files in S3

I have a bucket in S3 with versioning enabled.我在 S3 中有一个启用了版本控制的存储桶。 There is a file that comes is and updates its contents.有一个文件来了并更新了它的内容。 There is a unique identifies in that file and I sometime with the new file coming in, the content of the existing is not there, which needs to be retained.该文件中有一个唯一标识,有时我会收到新文件,现有的内容不存在,需要保留。 My goal here is to have a file which has all the contents of the new file and all the stuff from the old file which was not there.我的目标是拥有一个文件,其中包含新文件的所有内容以及旧文件中不存在的所有内容。

I have a small python script which does the job and I can schedule it on S3 trigger as well, but is there any AWS implementation for this issue?我有一个小的 python 脚本可以完成这项工作,我也可以在 S3 触发器上安排它,但是有没有针对这个问题的 AWS 实现? like using S3 -> XXXX service that would give the changes in between the files (not line by line though) and maybe creates a new file.就像使用 S3 -> XXXX 服务一样,它会在文件之间进行更改(虽然不是逐行)并且可能会创建一个新文件。

my python code looks something like:我的 python 代码看起来像这样:

    old_file = 'file1.1.txt'
    new_file = 'file1.2.txt'
    output_file = 'output_pd.txt'

    # Read the old file into a Pandas dataframe
    old_df = pd.read_csv(old_file, sep="\t", header=None)
    # car_df = pd.read_csv(car_file, sep="\t")
    new_df = pd.read_csv(new_file, sep="\t", header=None)

    # Find the values that are present in the old file and missing in the new file
    missing_values = old_df[~old_df.iloc[:,0].isin(new_df.iloc[:,0])]

    # Append the missing values to the new file
    final_df = new_df.append(missing_values, ignore_index=True)

    # Write the final dataframe to a new file
    final_df.to_csv(output_file, sep=' ', index=False, header=None)

But looking for some native AWS solution/ best practice.但正在寻找一些本地 AWS 解决方案/最佳实践。

but is there any AWS implementation for this issue?但是这个问题有任何 AWS 实现吗?

No, there is no any native AWS implementation for comparing files' content.不,没有用于比较文件内容的任何本地 AWS 实现。 You have to implement that yourself, as you did right now.您必须自己实施,就像您现在所做的那样。 You can host your code as a lambda function will will be automatically triggered by S3 uploads.您可以将代码托管为 lambda function 将由 S3 上传自动触发。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM