简体   繁体   English

从 s3 存储桶中获取唯一文件

[英]Get unique files from s3 bucket

I am worknig on .net c# project.我正在 .net c# 项目上工作。 My code is fetching files from s3 bucket.我的代码正在从 s3 存储桶中获取文件。 In s3 bucket there are some duplicate files but their names are different.在 s3 存储桶中有一些重复的文件,但它们的名称不同。 I want to fetch only unique files from s3 bucket.我只想从 s3 存储桶中获取唯一文件。 I am using this query to fetch all files from s3 bucket我正在使用此查询从 s3 存储桶中获取所有文件

ListObjectsV2Request listRequest = new ListObjectsV2Request { BucketName = awsBucketName, Prefix = fullPath };
var listResult = await client.ListObjectsV2Async(listRequest);
var obj = listResult.S3Objects.Where(x => x.Key.EndsWith(".pdf") && x.Size > 0)
                    .OrderByDescending(x => x.LastModified)

How to get files with unique content to avoid duplicates?如何获取具有唯一内容的文件以避免重复?

Should I just read all files 1 by 1 and remove duplicate ones?我应该逐一读取所有文件并删除重复的文件吗? Or is there any easy way to avoid duplicates?或者有什么简单的方法可以避免重复?

You can group by the ETag which files that have the same content will have the same Etag and grouping them by ETag will group all the duplicates so you can choose only one of them您可以按 ETag 分组,哪些具有相同内容的文件将具有相同的 Etag,按 ETag 对它们分组会将所有重复项分组,因此您只能选择其中一个

ListObjectsV2Request listRequest = new ListObjectsV2Request { BucketName = awsBucketName, Prefix = fullPath };
var listResult = await client.ListObjectsV2Async(listRequest);
var obj = listResult.S3Objects.Where(x => x.Key.EndsWith(".pdf") && x.Size > 0)
                    .OrderByDescending(x => x.LastModified).GroupBy(x => x.ETag)
                    .Select(x => x.First())
                    .ToList();

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM