[英]How to list AWS S3 objects and versions in a versioned bucket using Java
[英]How to list all AWS S3 objects in a bucket using Java
使用 Java 获取 S3 存储桶中所有项目的列表的最简单方法是什么?
List<S3ObjectSummary> s3objects = s3.listObjects(bucketName,prefix).getObjectSummaries();
此示例仅返回 1000 个项目。
这可能是一种解决方法,但这解决了我的问题:
ObjectListing listing = s3.listObjects( bucketName, prefix );
List<S3ObjectSummary> summaries = listing.getObjectSummaries();
while (listing.isTruncated()) {
listing = s3.listNextBatchOfObjects (listing);
summaries.addAll (listing.getObjectSummaries());
}
对于那些在 2018 年以上阅读本文的人。 有两个新的分页无忧 API:一个在 AWS SDK for Java 1.x 中,另一个在 2.x 中。
Java SDK 中有一个新的 API ,它允许您在不处理分页的情况下遍历 S3 存储桶中的对象:
AmazonS3 s3 = AmazonS3ClientBuilder.standard().build();
S3Objects.inBucket(s3, "the-bucket").forEach((S3ObjectSummary objectSummary) -> {
// TODO: Consume `objectSummary` the way you need
System.out.println(objectSummary.key);
});
这个迭代是懒惰的:
S3ObjectSummary
的列表将在需要时被延迟获取,一次一页。 可以使用withBatchSize(int)
方法控制页面的大小。
API 已更改,因此这里是 SDK 2.x 版本:
S3Client client = S3Client.builder().region(Region.US_EAST_1).build();
ListObjectsV2Request request = ListObjectsV2Request.builder().bucket("the-bucket").prefix("the-prefix").build();
ListObjectsV2Iterable response = client.listObjectsV2Paginator(request);
for (ListObjectsV2Response page : response) {
page.contents().forEach((S3Object object) -> {
// TODO: Consume `object` the way you need
System.out.println(object.key());
});
}
调用操作时,将返回此类的一个实例。 此时,还没有进行任何服务调用,因此不能保证请求是有效的。 当您遍历 iterable 时,SDK 将通过调用服务来延迟加载响应页面,直到没有剩余页面或您的迭代停止为止。 如果您的请求中有错误,则只有在您开始遍历可迭代对象后,您才会看到失败。
这直接来自 AWS 文档:
AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider());
ListObjectsRequest listObjectsRequest = new ListObjectsRequest()
.withBucketName(bucketName)
.withPrefix("m");
ObjectListing objectListing;
do {
objectListing = s3client.listObjects(listObjectsRequest);
for (S3ObjectSummary objectSummary :
objectListing.getObjectSummaries()) {
System.out.println( " - " + objectSummary.getKey() + " " +
"(size = " + objectSummary.getSize() +
")");
}
listObjectsRequest.setMarker(objectListing.getNextMarker());
} while (objectListing.isTruncated());
我正在处理我们系统生成的大量对象; 我们改变了存储数据的格式,需要检查每个文件,确定哪些是旧格式,然后进行转换。 还有其他方法可以做到这一点,但这与您的问题有关。
ObjectListing list = amazonS3Client.listObjects(contentBucketName, contentKeyPrefix);
do {
List<S3ObjectSummary> summaries = list.getObjectSummaries();
for (S3ObjectSummary summary : summaries) {
String summaryKey = summary.getKey();
/* Retrieve object */
/* Process it */
}
list = amazonS3Client.listNextBatchOfObjects(list);
}while (list.isTruncated());
使用适用于 Java 的 AWS 开发工具包列出密钥
http://docs.aws.amazon.com/AmazonS3/latest/dev/ListingObjectKeysUsingJava.html
import java.io.IOException;
import com.amazonaws.AmazonClientException;
import com.amazonaws.AmazonServiceException;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3Client;
import com.amazonaws.services.s3.model.ListObjectsRequest;
import com.amazonaws.services.s3.model.ListObjectsV2Request;
import com.amazonaws.services.s3.model.ListObjectsV2Result;
import com.amazonaws.services.s3.model.ObjectListing;
import com.amazonaws.services.s3.model.S3ObjectSummary;
public class ListKeys {
private static String bucketName = "***bucket name***";
public static void main(String[] args) throws IOException {
AmazonS3 s3client = new AmazonS3Client(new ProfileCredentialsProvider());
try {
System.out.println("Listing objects");
final ListObjectsV2Request req = new ListObjectsV2Request().withBucketName(bucketName);
ListObjectsV2Result result;
do {
result = s3client.listObjectsV2(req);
for (S3ObjectSummary objectSummary :
result.getObjectSummaries()) {
System.out.println(" - " + objectSummary.getKey() + " " +
"(size = " + objectSummary.getSize() +
")");
}
System.out.println("Next Continuation Token : " + result.getNextContinuationToken());
req.setContinuationToken(result.getNextContinuationToken());
} while(result.isTruncated() == true );
} catch (AmazonServiceException ase) {
System.out.println("Caught an AmazonServiceException, " +
"which means your request made it " +
"to Amazon S3, but was rejected with an error response " +
"for some reason.");
System.out.println("Error Message: " + ase.getMessage());
System.out.println("HTTP Status Code: " + ase.getStatusCode());
System.out.println("AWS Error Code: " + ase.getErrorCode());
System.out.println("Error Type: " + ase.getErrorType());
System.out.println("Request ID: " + ase.getRequestId());
} catch (AmazonClientException ace) {
System.out.println("Caught an AmazonClientException, " +
"which means the client encountered " +
"an internal error while trying to communicate" +
" with S3, " +
"such as not being able to access the network.");
System.out.println("Error Message: " + ace.getMessage());
}
}
}
作为在可能被截断时列出 S3 对象的更简洁的解决方案:
ListObjectsRequest request = new ListObjectsRequest().withBucketName(bucketName);
ObjectListing listing = null;
while((listing == null) || (request.getMarker() != null)) {
listing = s3Client.listObjects(request);
// do stuff with listing
request.setMarker(listing.getNextMarker());
}
格雷你的解决方案很奇怪,但你看起来是个好人。
AmazonS3Client s3Client = new AmazonS3Client(new BasicAWSCredentials( ....
ObjectListing images = s3Client.listObjects(bucketName);
List<S3ObjectSummary> list = images.getObjectSummaries();
for(S3ObjectSummary image: list) {
S3Object obj = s3Client.getObject(bucketName, image.getKey());
writeToFile(obj.getObjectContent());
}
我知道这是一篇旧帖子,但这对任何人可能仍然有用:2.1 版的 Java/Android SDK 提供了一个名为 setMaxKeys 的方法。 像这样:
s3objects.setMaxKeys(arg0)
您现在可能已经找到了解决方案,但请检查一个答案是否正确,以便将来可以帮助其他人。
这对我有用。
Thread thread = new Thread(new Runnable(){
@Override
public void run() {
try {
List<String> listing = getObjectNamesForBucket(bucket, s3Client);
Log.e(TAG, "listing "+ listing);
}
catch (Exception e) {
e.printStackTrace();
Log.e(TAG, "Exception found while listing "+ e);
}
}
});
thread.start();
private List<String> getObjectNamesForBucket(String bucket, AmazonS3 s3Client) {
ObjectListing objects=s3Client.listObjects(bucket);
List<String> objectNames=new ArrayList<String>(objects.getObjectSummaries().size());
Iterator<S3ObjectSummary> oIter=objects.getObjectSummaries().iterator();
while (oIter.hasNext()) {
objectNames.add(oIter.next().getKey());
}
while (objects.isTruncated()) {
objects=s3Client.listNextBatchOfObjects(objects);
oIter=objects.getObjectSummaries().iterator();
while (oIter.hasNext()) {
objectNames.add(oIter.next().getKey());
}
}
return objectNames;
}
您不想一次列出存储桶中的所有 1000 个对象。 更强大的解决方案是一次最多获取 10 个对象。 您可以使用withMaxKeys方法执行此操作。
以下代码创建一个 S3 客户端,一次获取 10 个或更少的对象,并根据前缀进行过滤,并为获取的对象生成一个预签名的 url :
import com.amazonaws.HttpMethod;
import com.amazonaws.SdkClientException;
import com.amazonaws.auth.AWSStaticCredentialsProvider;
import com.amazonaws.auth.BasicAWSCredentials;
import com.amazonaws.regions.Regions;
import com.amazonaws.services.s3.AmazonS3;
import com.amazonaws.services.s3.AmazonS3ClientBuilder;
import com.amazonaws.services.s3.model.*;
import java.net.URL;
import java.util.Date;
/**
* @author shabab
* @since 21 Sep, 2020
*/
public class AwsMain {
static final String ACCESS_KEY = "";
static final String SECRET = "";
static final Regions BUCKET_REGION = Regions.DEFAULT_REGION;
static final String BUCKET_NAME = "";
public static void main(String[] args) {
BasicAWSCredentials awsCreds = new BasicAWSCredentials(ACCESS_KEY, SECRET);
try {
final AmazonS3 s3Client = AmazonS3ClientBuilder
.standard()
.withRegion(BUCKET_REGION)
.withCredentials(new AWSStaticCredentialsProvider(awsCreds))
.build();
ListObjectsV2Request req = new ListObjectsV2Request().withBucketName(BUCKET_NAME).withMaxKeys(10);
ListObjectsV2Result result;
do {
result = s3Client.listObjectsV2(req);
result.getObjectSummaries()
.stream()
.filter(s3ObjectSummary -> {
return s3ObjectSummary.getKey().contains("Market-subscriptions/")
&& !s3ObjectSummary.getKey().equals("Market-subscriptions/");
})
.forEach(s3ObjectSummary -> {
GeneratePresignedUrlRequest generatePresignedUrlRequest =
new GeneratePresignedUrlRequest(BUCKET_NAME, s3ObjectSummary.getKey())
.withMethod(HttpMethod.GET)
.withExpiration(getExpirationDate());
URL url = s3Client.generatePresignedUrl(generatePresignedUrlRequest);
System.out.println(s3ObjectSummary.getKey() + " Pre-Signed URL: " + url.toString());
});
String token = result.getNextContinuationToken();
req.setContinuationToken(token);
} while (result.isTruncated());
} catch (SdkClientException e) {
e.printStackTrace();
}
}
private static Date getExpirationDate() {
Date expiration = new java.util.Date();
long expTimeMillis = expiration.getTime();
expTimeMillis += 1000 * 60 * 60;
expiration.setTime(expTimeMillis);
return expiration;
}
}
如文档中所述,使用带有自动分页的SDK V2反应流集成
此示例使用反应流标准的项目反应器实现,但它也适用于其他实现(例如,RxJava)
ListObjectsV2Request listObjects = ListObjectsV2Request
.builder()
.bucket("<bucketName>")
.maxKeys(100) // Number of items per page. Using pagination to get all objects in the bucket.
.build();
// https://docs.aws.amazon.com/sdk-for-java/latest/developer-guide/pagination.html
// Auto-pagination method that makes multiple service calls to get the next page of results automatically.
// Publish messages by batches to sqs as they come from s3 pagination result.
return Flux.from(s3Client.listObjectsV2Paginator(listObjects))
.flatMap(list -> Flux.fromIterable(list.contents())
.map(s3Object -> transformObject(s3Object))
.collectList()
.flatMap(sqsPublisher::publishBatch))
.doOnError(e -> log.error("Failed to blabla", e))
.then();
试试这个
public void getObjectList(){
System.out.println("Listing objects");
ObjectListing objectListing = s3.listObjects(new ListObjectsRequest()
.withBucketName(bucketName)
.withPrefix("ads"));
for (S3ObjectSummary objectSummary : objectListing.getObjectSummaries()) {
System.out.println(" - " + objectSummary.getKey() + " " +
"(size = " + objectSummary.getSize() + ")");
}
}
您可以使用特定前缀的存储桶中的所有对象。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.