简体   繁体   中英

Splitting json.dumps into smaller files

I am extracting email attachments from outlook using the graph api and loading them into an s3 bucket in aws.

main = 'https://graph.microsoft.com/v1.0/me/mailFolders/inbox/messages?$expand=attachments&$search="hasAttachments:true"&Top=10'
  
response = requests.get(main, headers={'Authorization': 'Bearer ' + result['access_token']})
if response.status_code != 200:
    raise Exception(response.json())

response_json = response.json()

emails = response_json['value']
for email in emails:
    if email['hasAttachments']:
        email_id = email['id']
        download_email_attachments(email_id, headers)
        print(email['subject'])
        print(email['hasAttachments'])

    s3 = boto3.client('s3')
    bucket ='demo-bucket'
    fileName = email['subject'] + '.json'
    fileContent = bytes(json.dumps(graph_data, indent=2).encode('UTF-8'))

    s3.put_object(Bucket=bucket, Key=fileName, Body=fileContent)
    print('Upload Complete')

graph_data is an endpoint that calls the api to pull the data, similar to main, but has an additional search criteria.

This is the block of code i'm using to take the file content and upload it to the s3 bucket. However, when I run this the data gets put into a single file but it will still create a different file for each attachment name. So i'll have 10 files with the exact same data, but different names.

I'm getting the below sample but if I pull 10 emails, I get 10 different file names with all 10 emails worth of data in each one.

Sample of data from 1 email:

{
    "@odata.context": "https://graph.microsoft.com/v1.0/$metadata#users('48d31887-5fad-4d73-a9f5-3c356e68a038')/mailFolders('inbox')/messages(attachments())",
    "value": [
        {
            "@odata.etag": "W/\"CQAAABYAAAAiIsqMbYjsT5e/T7KzowPTAASWWffQ\"",
            "id": "AAMkAGVmMDEzMTM4LTZmYWUtNDdkNC1hMDZiLTU1OGY5OTZhYmY4OABGAAAAAAAiQ8W967B7TKBjgx9rVEURBwAiIsqMbYjsT5e-T7KzowPTAAAAAAEMAAAiIsqMbYjsT5e-T7KzowPTAASXFxfBAAA=",
            "createdDateTime": "2022-06-24T09:31:45Z",
            "lastModifiedDateTime": "2022-06-24T09:31:46Z",
            "changeKey": "CQAAABYAAAAiIsqMbYjsT5e/T7KzowPTAASWWffQ",
            "categories": [],
            "receivedDateTime": "2022-06-24T09:31:45Z",
            "sentDateTime": "2022-06-24T09:31:45Z",
            "hasAttachments": true,
            "internetMessageId": "<SJ0PR15MB5245AA459418AC65A5B18C6ACDB49@SJ0PR15MB5245.namprd15.prod.outlook.com>",
            "subject": "Voice Mail (25 seconds)",
            "bodyPreview": "You received a voice mail from Developer desginer at Developer@2z5l84.onmicrosoft.com.Work:   917573933439Email:  Developer@2z5l84.onmicrosoft.com________________________________Thank you for using Transcription! If you don't see a transcr",
            "importance": "normal",
            "parentFolderId": "AAMkAGVmMDEzMTM4LTZmYWUtNDdkNC1hMDZiLTU1OGY5OTZhYmY4OAAuAAAAAAAiQ8W967B7TKBjgx9rVEURAQAiIsqMbYjsT5e-T7KzowPTAAAAAAEMAAA=",
            "conversationId": "AAQkAGVmMDEzMTM4LTZmYWUtNDdkNC1hMDZiLTU1OGY5OTZhYmY4OAAQABWnJa-v201IgLwcCubGPfM=",
            "conversationIndex": "AQHYh604Faclr+/bTUiAvBwK5sY98w==",
            "isDeliveryReceiptRequested": false,
            "isReadReceiptRequested": false,
            "isRead": false,
            "isDraft": false,
            "webLink": "https://outlook.office365.com/owa/?ItemID=AAMkAGVmMDEzMTM4LTZmYWUtNDdkNC1hMDZiLTU1OGY5OTZhYmY4OABGAAAAAAAiQ8W967B7TKBjgx9rVEURBwAiIsqMbYjsT5e%2FT7KzowPTAAAAAAEMAAAiIsqMbYjsT5e%2FT7KzowPTAASXFxfBAAA%3D&exvsurl=1&viewmodel=ReadMessageItem",
            "inferenceClassification": "focused",
            "body": {
                "contentType": "html",
                "content": "<html><head><meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\"><style type=\"text/css\"><!--a:link{color:#0563C1}a:visited{color:#954F72}a:active{color:#954F72}--></style></head><body><style type=\"text/css\"><!--a:link{color:#0563C1}a:visited{color:#954F72}a:active{color:#954F72}--></style><div style=\"font-family:'Segoe UI',Arial,sans-serif; background-color:#ffffff; color:#16233A; font-size:10.5pt\"><div id=\"UM-call-info\" lang=\"en\"><div style=\"font-family:'Segoe UI',Arial,sans-serif; font-size:9pt; color:#595959\">You received a voice mail from Developer desginer at <a href=\"sip:Developer@2z5l84.onmicrosoft.com\" style=\"color:#0070C0\">Developer@2z5l84.onmicrosoft.com</a>.</div><br><table border=\"0\" style=\"width:100%; table-layout:auto\"><tbody><tr><td width=\"15%\" nowrap=\"\" style=\"font-family:'Segoe UI',Arial,sans-serif; color:#595959; font-size:9pt; border-width:0in\">Work:</td><td width=\"85%\" style=\"font-family:'Segoe UI',Arial,sans-serif; color:#000000; border-width:0in; font-size:9pt; vertical-align:top; padding-left:10px; padding-right:10px\"><a href=\"tel:917573933439\" style=\"color:#3366CC\">917573933439</a></td></tr><tr><td width=\"15%\" nowrap=\"\" style=\"font-family:'Segoe UI',Arial,sans-serif; color:#595959; font-size:9pt; border-width:0in\">Email:</td><td width=\"85%\" style=\"font-family:'Segoe UI',Arial,sans-serif; color:#000000; border-width:0in; font-size:9pt; vertical-align:top; padding-left:10px; padding-right:10px\"><a href=\"mailto:Developer@2z5l84.onmicrosoft.com\" style=\"color:#3366CC\">Developer@2z5l84.onmicrosoft.com</a></td></tr></tbody></table><br><br></div><div><hr style=\"width:75%; background-color:#bfcddb; border:0 none; text-align:left; margin-left:0px\"><br></div><div lang=\"en\" dir=\"ltr\" style=\"font-family:'Segoe UI',Arial,sans-serif; font-size:9pt; color:#595959; font-weight:bold\">Thank you for using Transcription! If you don't see a transcript above, it's because the audio quality was not clear enough to transcribe.</div><br><div lang=\"en\" dir=\"ltr\" style=\"font-family:'Segoe UI',Arial,sans-serif; font-size:9pt; color:#595959\"><a href=\"https://aka.ms/vmsettings\" style=\"font-size:9pt; color:#0070C0\">Set Up Voice Mail</a></div></div></body></html>"
            },
            "sender": {
                "emailAddress": {
                    "name": "Developer@2z5l84.onmicrosoft.com",
                    "address": "Developer@2z5l84.onmicrosoft.com"
                }
            },
            "from": {
                "emailAddress": {
                    "name": "Developer@2z5l84.onmicrosoft.com",
                    "address": "Developer@2z5l84.onmicrosoft.com"
                }
            },
            "toRecipients": [
                {
                    "emailAddress": {
                        "name": "Megan Bowen",
                        "address": "MeganB@M365x214355.onmicrosoft.com"
                    }
                }
            ],
            "ccRecipients": [],
            "bccRecipients": [],
            "replyTo": [],
            "flag": {
                "flagStatus": "notFlagged"
            },
            "attachments": [
                {
                    "@odata.type": "#microsoft.graph.fileAttachment",
                    "@odata.mediaContentType": "audio/mp3",
                    "id": "AAMkAGVmMDEzMTM4LTZmYWUtNDdkNC1hMDZiLTU1OGY5OTZhYmY4OABGAAAAAAAiQ8W967B7TKBjgx9rVEURBwAiIsqMbYjsT5e-T7KzowPTAAAAAAEMAAAiIsqMbYjsT5e-T7KzowPTAASXFxfBAAABEgAQAFbg-4hu7BZIrggwQ9N5ikk=",
                    "lastModifiedDateTime": "2022-06-24T09:31:45Z",
                    "name": "audio.mp3",
                    "contentType": "audio/mp3",
                    "size": 168986,
                    "isInline": false,
                    "contentId": null,
                    "contentLocation": null,
                    "contentBytes": "//OIxAAAAAAAAAAAAFhpbmcAAAAPAAACwAACkvgAAwcJDQ8TFhk"

How can I split this data so each attachment has it's own file with the data related to the attachment?

graph_data is an endpoint

So not a JSON value. Then why use that for json.dumps ?


so each attachment has it's own file with the data related to the attachment

Loop over the attachments.

emails = response_json['value']

s3 = boto3.client('s3')
bucket ='demo-bucket'

for email in emails:
    email_id = email['id']
    subject = email['subject']
    if email['hasAttachments']:
        print(subject)
        attachments = email['attachments']
        for attachment in attachments:
            name = attachment['name']
            fileContent = bytes(json.dumps(attachment, indent=2).encode('UTF-8'))
            s3.put_object(Bucket=bucket, Key=name.replace('.', '_') + '.json', Body=fileContent)
        print('Upload Complete')
        download_email_attachments(email_id, headers)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM