简体   繁体   中英

How to get a unique list of objects from AWS S3 bucket

the following code connecting to the AWS S3 bucket and returning the list of objects from S3 bucket. I'm trying to create a unique list out of original list, by selecting partial value of the object (ie batchID = str((s3_file.name).split("/"))[32:-13]) . I have declared “batchID" as an array. When I use set() to return unique value it returns unique numbers within each value. example: ['1', '0', '3', '2', '5', '4', '9', '8'], ['1', '0', '3', '2', '5', '4', '7', '9', '8'] etc. So it is de-duping horizontally verses vertically in the list. I'm expecting the value to be unique. See below expected output. I also tried to use nested "for loops" and used "not in” to return the unique values but it didn't work, it is still removing duplicates vertically and not horizontally. Can anyone please help. Thank you in advance.

def __init__(self, aws_access_key_id, aws_secret_access_key, aws_bucket_to_download, use_ssl):
    self.run_id = []
    self.batchID = []
    self._aws_connection = S3Connection(aws_access_key_id, aws_secret_access_key, is_secure = use_ssl)
    self._runId(aws_bucket_to_download)

def _runId(self,aws_bucket_to_download):
    if not self._bucketExists(aws_bucket_to_download):
        self._printBucketNotFoundMessage(aws_bucket_to_download)
    else:
    bucket = self._aws_connection.get_bucket(aws_bucket_to_download)
    for s3_file in bucket.list(prefix='Download/test_queue1/'):
        batchID = str((s3_file.name).split("/"))[32:-13]
        #a = set(batchID)
        #batchID = list(a)
        print batchID
        #newList = list(set(batchID))
        #print newList`

Output: 144019080231459 144019080231459 144019800231759 144019800231759

Expected output: 144019080231459 144019800231759

I think you're asking how to remove duplicate batch IDs. Why don't you add each batch ID to a list as you retrieve it, ignoring it if it's already in the list, for example:

batchIDlist = []

for s3_file in bucket.list(prefix='Download/test_queue1/'):
    batchID = str((s3_file.name).split("/"))[32:-13]

    if batchID not in batchIDlist:
        batchIDlist.append(batchID)

This will also keep items in the same order they were first found.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM