How to parse "{Key=Value}" structs into JSON with Python?

Question

If I run an Athena query in AWS, the data I get back has structs with key/value pairs that look like this:

{
    "events": "[{deviceType=Android,logins=400},{deviceType=iPhone,logins=550}]"
}

I can use regular expressions to parse this, but things like special characters make that de-serialization very error-prone.

For example, {deviceType=Android, date=2022-01-01} will run into issues with delimiters if I use regex.

Is there an existing de-serializer for this type of thing?

EDIT:

This is the de-serialize regex I have:

def deserialize(s):
    # Surround any word with "
    s1 = re.sub('(\w+)', '"\g<1>"', s)

    # Replace = with :
    s2 = re.sub('=', ':', s1)

    return json.loads(s2)

This hits issues when there are special characters in the value like "-" or "." Regex isn't able to properly determine the "word", so doesn't place the enclosing quotes properly.

Answer 1

The data inside the quotes is almost JSON but it's missing the quotes around keys and values. With a few judiciously chained .replace() method calls, you should be able to convert it from almost-JSON to JSON and then deserialize it using the json module:

import json
obj = {"events": "[{deviceType=Android, date=2022-01-01}]"}
events = obj['events']
events_json = events.replace(', ', ',').replace('{', '{"').replace('}', '"}').replace('=', '":"').replace(',', '","').replace('}","{','},{')
parsed = json.loads(events_json)
print(parsed[0])

print(parsed[0]['deviceType']) # prints 'Android'
print(parsed[0]['date']) # prints '2022-01-01'

*Edit to fix an issue raised by MisterMiyagi.

Answer 2

Instead of parsing this not-quite-JSON I recommend casting maps and arrays to JSON in your queries:

SELECT CAST(events AS JSON) AS events …

This has the added benefit of making the output less ambiguous to parse (eg without casting to JSON there is no way to know if "[1, 2, 3]" was an array of integers or strings, or if "[hello, world]" was an array of two elements, or one element with a comma inside).

Answer 3

Given the data as shown, you can isolate the strings between curly brackets with RE then further split those strings into their component parts. Here's an example:

import re

d = {'events': "[{deviceType=Android,logins=400},{deviceType=iPhone,logins=550}]"}

for t in re.findall('(?<={).+?(?=})', d['events']):
    for p in t.split(','):
        print(p)

Output:

deviceType=Android
logins=400
deviceType=iPhone
logins=550

How to parse "{Key=Value}" structs into JSON with Python?

Question

3 answers

solution1
1 2022-04-14 14:29:38

solution2
1 2022-04-17 20:12:26

solution3
0 2022-04-14 14:18:52

How to parse "{Key=Value}" structs into JSON with Python?

Question

3 answers

solution1 1 2022-04-14 14:29:38

solution2 1 2022-04-17 20:12:26

solution3 0 2022-04-14 14:18:52

solution1
1 2022-04-14 14:29:38

solution2
1 2022-04-17 20:12:26

solution3
0 2022-04-14 14:18:52