简体   繁体   中英

Modify JSON message while reading from AWS SNS using AWS Lambda

I have an upstream application sending following JSON message to an SNS topic. We use AWS lambda function to save this JSON object in S3:

{
"processResult": {
    "processName": "XYZ",
    "stageResults": {
        "Read Files": {
            "status": "PROCESSED",
            "error": "",
            "timeTaken": 26064469473
        },
        "Convert Files": {
            "status": "PROCESSED",
            "error": "",
            "timeTaken": 97968896
        }
    },
    "processMetrics": {
        "filesProcessed": 1157,
        "filesWithExceptionCount": 1,
        "timeTaken": "367.460031s",
        "metricsCalcTime": "6.061847s",
        "totalTimeTaken": "373.521878s"
    },
    "succeeded": true
  }
}

If you observe within the "stageResults" STRUCT, I have the attributes with spaces, like "Read Files" and "Convert Files" and when i am trying to read this JSON from S3 using AWS Athena (AWS Glue Crawler), I am getting following error:

    HIVE_METASTORE_ERROR: com.facebook.presto.spi.PrestoException: Error: : expected at the position 51 of 'struct<....
......but ' ' is found. (Service: null; Status Code: 0; Error Code: null; Request ID: null)

But when I manually edit the JSON message by modifying "stageResults" STRUCT like "Read_Files" and "Convert_Files", I was able to read and query the JSON using AWS Athena tables successfully.

Following is the piece of code from AWS Lambda which triggers on SNS topic, reads JSON message and saves it in S3:

import json
import boto3
import random
import string

file_name = ''.join([random.choice(string.ascii_lowercase) for i in range(16)])

def lambda_handler(event, context):
    target_bucket = 'bucket-name'
    target_key = 'input=clientdata/'  + file_name + '.json'
    s3 = boto3.resource('s3')
    for record in event['Records']:
        payload = record["body"]
        data = json.loads(payload)
        print(data)
        print("copying JSON message...")
        s3.Object('target_bucket', 'target_key').put(
            Body=(bytes(json.dumps(data).encode('UTF-8')))
        )

Now, I want to know how to modify this JSON message before saving it to S3 using AWS Lambda. Any help is appreciated.

If all you want is to change every key containing spaces into something suitable for your needs this might do:

import re
wrong_keys = []
for key in data.keys():
    if key.strip().find(" ") != -1:
        wrong_keys.append(key)
for key in wrong_key:
  data[re.sub(r'[\W]+', '_',key)] = data.pop(key)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM