从 s3 存储桶中获取唯一文件

Question

I am worknig on .net c# project.我正在 .net c# 项目上工作。 My code is fetching files from s3 bucket.我的代码正在从 s3 存储桶中获取文件。 In s3 bucket there are some duplicate files but their names are different.在 s3 存储桶中有一些重复的文件，但它们的名称不同。 I want to fetch only unique files from s3 bucket.我只想从 s3 存储桶中获取唯一文件。 I am using this query to fetch all files from s3 bucket我正在使用此查询从 s3 存储桶中获取所有文件

ListObjectsV2Request listRequest = new ListObjectsV2Request { BucketName = awsBucketName, Prefix = fullPath };
var listResult = await client.ListObjectsV2Async(listRequest);
var obj = listResult.S3Objects.Where(x => x.Key.EndsWith(".pdf") && x.Size > 0)
                    .OrderByDescending(x => x.LastModified)

How to get files with unique content to avoid duplicates?如何获取具有唯一内容的文件以避免重复？

Should I just read all files 1 by 1 and remove duplicate ones?我应该逐一读取所有文件并删除重复的文件吗？ Or is there any easy way to avoid duplicates?或者有什么简单的方法可以避免重复？

Answer 1

You can group by the ETag which files that have the same content will have the same Etag and grouping them by ETag will group all the duplicates so you can choose only one of them您可以按 ETag 分组，哪些具有相同内容的文件将具有相同的 Etag，按 ETag 对它们分组会将所有重复项分组，因此您只能选择其中一个

ListObjectsV2Request listRequest = new ListObjectsV2Request { BucketName = awsBucketName, Prefix = fullPath };
var listResult = await client.ListObjectsV2Async(listRequest);
var obj = listResult.S3Objects.Where(x => x.Key.EndsWith(".pdf") && x.Size > 0)
                    .OrderByDescending(x => x.LastModified).GroupBy(x => x.ETag)
                    .Select(x => x.First())
                    .ToList();

从 s3 存储桶中获取唯一文件

问题描述

1 个解决方案

解决方案1
0 2022-12-17 17:37:22

从 s3 存储桶中获取唯一文件

问题描述

1 个解决方案

解决方案1 0 2022-12-17 17:37:22

解决方案1
0 2022-12-17 17:37:22