Python: Parsing OrderedDict of OrderedDicts into Pandas Dataframe

Question

I have what appears to be an OrderdDict containing OrderedDict's within a subset of what I am referring to as it's 'main' OrderedDict. I am attempting to parse this object to a Pandas DataFrame. (I am using Python 3, Anaconda distribution.)

I have searched and found some examples about comprehension of such data structures, but the structures in the examples do not seem to match mine.

As you can see in the below example, the OrderedDicts I care about are nested within a key called 'records' within the 'main' OrderedDict. I would like to take the below example:

od = OrderedDict([('totalSize', 3), ('done', True), ('records', [OrderedDict([('attributes', OrderedDict([('type', 'Cust'), ('url', '/example/url/foo/bar/123')])), ('Id', '4563456kjgfu4uyHHY3'), ('Phone', None), ('FirstName', 'Bill'), ('LastName', 'Bob'), ('Email', 'billbob@foo.com')]), OrderedDict([('attributes', OrderedDict([('type', 'Cust'), ('url', '/example/url/foo/bar/234')])), ('Id', 'KJ23jdkd889DKJD'), ('Phone', '(444) 444-4444'), ('FirstName', 'Amanda'), ('LastName', 'Smith'), ('Email', 'amanda.smith@bar.com')]), OrderedDict([('attributes', OrderedDict([('type', 'Cust'), ('url', '/example/url/foo/bar/654')])), ('Id', '23kkjKJkj2323KJ33'), ('Phone', '(555) 555-5555'), ('FirstName', 'Julie'), ('LastName', 'jackson'), ('Email', 'jjackson@test.com')])])])

...and obtain a DataFrame with the columns 'Id', 'Phone', 'FirstName', 'LastName', and 'Email'.

So far I have been able to extract what I believe to be a list of lists:

li = []
list1 = [(record['Id'], record['Phone'],record['FirstName'],record['LastName']) 
for record in od['records']]
li.append(list1)
li[:]

This list of lists strategy, however omits the possibility of column names. I would please like to ask your help getting this the final step of the way into a Pandas DataFrame.

Thank you very much in advance.

Answer 1

I'm not familiar with Pandas DataFrame, but constructing a dictionary of lists seemed to be a proper thing to do.

# Attributes of interest
attrs = ['Id', 'Phone', 'FirstName', 'LastName', 'Email']
records = od['records']

data = {}

for rec in records:
    for k in attrs:
        # setdefault initializes the array for key if necessary
        data.setdefault(k, []).append(rec[k])

dframe = pandas.DataFrame(data)

print(dframe)

A solution that doesn't require specifying the fields. attributes is ignored since it was not specified how to handle it, although it probably could be handled like the others.

records = od['records']
data = {}

for rec in records:
    for k, v in rec.items():
        if k == 'attributes':
            continue
        data.setdefault(k, []).append(v)

dframe = pandas.DataFrame(data)
print(dframe)

Python: Parsing OrderedDict of OrderedDicts into Pandas Dataframe

Question

1 answers

solution1
1 ACCPTED 2016-01-15 05:14:34

Python: Parsing OrderedDict of OrderedDicts into Pandas Dataframe

Question

1 answers

solution1 1 ACCPTED 2016-01-15 05:14:34

solution1
1 ACCPTED 2016-01-15 05:14:34