I want to read a specific file from S3 bucket. In my S3 bucket I will be having so many objects(directories and Sub directories). I want traverse through all the objects and have to read only that file.
I am trying below code:
val s3Client: AmazonS3Client = getS3Client()
try {
log.info("Listing objects from S3")
var counter = 0
val listObjectsRequest = new ListObjectsRequest()
.withBucketName(bucketName)
.withMaxKeys(2)
.withPrefix("Test/"+"Client_cd" + "/"+"DM1"+"/")
.withMarker("Test/"+"Client_cd" + "/"+"DM1"+"/")
var objectListing: ObjectListing = null
do {
objectListing = s3Client.listObjects(listObjectsRequest)
import scala.collection.JavaConversions._
for (objectSummary <- objectListing.getObjectSummaries) {
println( objectSummary.getKey + "\t" + StringUtils.fromDate(objectSummary.getLastModified))
}
listObjectsRequest.setMarker(objectListing.getNextMarker())
}
while (objectListing.isTruncated())
}
catch {
case e: Exception => {
log.error("Failed listing files. ", e)
throw e
}
}
In this path I have to read only .gz files from latest month folders. File Path:
"Mybucket/Test/Client_cd/Dm1/20181010_xxxxx/*.gz"
Here, I have to pass Client_cd as parameter for particular client.
How to filter the objects and to get particular files?
如果您正在使用EMR或正确设置了S3配置,则还可以使用sc.textFile("s3://bucket/Test/Client_cd/Dm1/20181010_xxxxx/*.gz")
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.