简体   繁体   中英

Python: How to handle invalid JSON string in Json.loads

Here is my use case, I have a AWS Lambda Python function where I get a sample data like below:

{
"time_script_executed": "2022-09-20 11:03:31",
"nv_official_build": "unknown",
"tool_cmd": "c:\abc\1.2.3.4\tool.exe ts /v /tr http://timestamp.entrust.net/TSS/as /td SHA256 \"sd.dll" 2>&1",
}

In my Lambda I do this

def lambda_handler(event, context):

event_body=event["body"]
input_dict=json.loads("{}".format(event_body))
now = datetime.datetime.now()
input_dict["time_2"]=now.strftime('%Y-%m-%dT%H:%M:%S')
input_json=json.dumps(input_dict)

try:
    # I will send the input_json to subsequent steps
    ...
except ClientError as e:
    print(e)
    exit(1)
response = {
    "statusCode": 200
}

return response

In a nutshell, here is what I am trying to do:

  1. Convert the incoming data string (that might contain invalid JSON characters in 1 specific attribute) to JSON
  2. Add new attributes to the JSON
  3. Send JSON downstream

The issue here is since tool_cmd will have some unescaped JSON String that will lead to json.loads errors. If I use json.dumps before I use json.loads (to escape the JSON), it is escaping all double quotes in the JSON but I just want to escape the tool_cmd attribute. Any straightforward way to do this in Python without having to use Regular Expressions?

The code that follows is not pretty, but you asked for no regex and I was in a typing mood.

This "thing" bellow will fix this very specific escape problem you have with your string. I validated the result with jsonformatter.curiousconcept.com and got back a confirmation for a valid string.

But I would rather try to work on the code that creates the string instead of using what I did here.

字符串验证

json_string = r'''"time_script_executed": "2022-09-20 11:03:31",
"nv_official_build": "unknown",
"tool_cmd": "c:\abc\1.2.3.4\tool.exe ts /v /tr http://timestamp.entrust.net/TSS/as /td SHA256 \"sd.dll" 2>&1",
'''

chars = ['"tool_cmd":', ' "']
split_string = json_string.split("\n")[2].split('\":')[1][2:-2]
for idx, char in enumerate(split_string):
    if char == '\\' and not split_string[idx+1] == '"':
        chars.append(char*2)
        continue

    if char == '"' and split_string[idx - 1] != "\\":
        chars.append("\\")

    chars.append(char)
chars.append('"')
fixed_line = ''.join(chars)

escaped_json_lines = []

for idx, line in enumerate(json_string.split('\n')):
    if idx == 2:
        escaped_json_lines.append(fixed_line)
        continue

    escaped_json_lines.append(line)

print('\n'.join(escaped_json_lines))

Output:

"time_script_executed": "2022-09-20 11:03:31",

"nv_official_build": "unknown",

"tool_cmd": "c:\abc\1.2.3.4\tool.exe ts /v /tr http://timestamp.entrust.net/TSS/as /td SHA256 "sd.dll" 2>&1"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM