简体   繁体   中英

Get folder content from S3 bucket

I am trying to get data from a folder in a S3 bucket. I have two folders in my bucket, articles and comments. I really only want to get all the data in the comments folder. The data is multiple json files. When I pass

This is an example of a json object in one of the many json files in the comments folder

{"7475199770543690800": {"author": "BKD2674", "body": "Saying its Meme, then saying you're buying in lol", "ups": 10, "fullname": "t1_fsqwfto", "created_utc": "2020-06-03T13:54:45", "subreddit": "stocks", "article_id": "gvuau0"}

I really only want the "body" portion of the json object since that contains the comment. I would like to store all of the content in the json files in maybe one large dictionary, the iterate through the dictionary and retrieve the contents in the "body" portion of the json object. If there is a better way to do this please let me know.

    s3 = boto3.resource('s3')
    bucket = s3.Bucket('diegos-reddit-bucket')


    for obj in bucket.objects.all():
        key = obj.key
        body = obj.get()['Body'].read()

This is just a sample I typed up to test it. I am getting my bucket, but S3 is reading articles instead of only the comments folder. Also the body variable is of type bytes.

You can filter bucket objects you query for by using the bucket objects filter function (I can't link directly to the function, scroll down a little).

It's true that the object's content is an byte encoded string. You can use body.decode('utf-8') to get the textual representation, but json.load should be able to handle that for you. Hence, this ought to work

import json, boto3

s3 = boto3.resource('s3')
bucket = s3.Bucket('diegos-reddit-bucket')

for obj in bucket.objects.filter(Prefix='comments'):
    body = json.load(obj.get()['Body'])

There's no way how to read only the body section of the JSON file, you have to download it and read it whole first.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM