简体   繁体   中英

How to convert columns of numpy arrays to lists when using .to_dict

I would like to take my pandas Dataframe and convert it to a list of dictionaries. I can do this using the pandas to_dict('records') function. However, this function takes any column values that are lists and returns numpy arrays. I would like for the content of the returned list of dictionaries to be base python objects rather than numpy arrays.

I understand I could iterate my outputted dictionaries but I was wondering if there is something more clever to do this.

Here is some sample code that shows my problem:

import pandas as pd
import numpy as np


data = pd.concat([
    pd.Series(['a--b', 'c--d', 'e--f'], name='key'),
    pd.Series(['123', '456', '789'], name='code'),
    pd.Series([np.array(['123', '098']), np.array(['000', '999']), np.array(['789', '432'])], name='codes')
    ], axis=1)

output = data.to_dict('records')

# this prints <class 'numpy.ndarray'>
print(type(output[0]['codes']))

output , in this case, looks like this:

[{'key': 'a--b', 'code': '123', 'codes': array(['123', '098'], dtype='<U3')},
 {'key': 'c--d', 'code': '456', 'codes': array(['000', '999'], dtype='<U3')},
 {'key': 'e--f', 'code': '789', 'codes': array(['789', '432'], dtype='<U3')}]

I would like for that print statement to print a list. I understand I could simply do the following:

for row in output:
    row['codes'] = row['codes'].tolist()

# this now prints <class 'list'>, which is what I want
print(type(output[0]['codes']))

However, my dataframe is of course much more complicated than this, and I have multiple columns that are numpy arrays. I know I could expand the snippet above to check which columns are array type and cast them using tolist() , but I'm wondering if there is something snappier or more clever? Perhaps something provided by Pandas that is optimized?

To be clear, here is the output I need to have:

print(output)
[{'key': 'a--b', 'code': '123', 'codes': ['123', '098']},
 {'key': 'c--d', 'code': '456', 'codes': ['000', '999']},
 {'key': 'e--f', 'code': '789', 'codes': ['789', '432']}]

Let us first use applymap to convert numpy array's to python lists, then use to_dict

cols = ['codes']
data.assign(**data[cols].applymap(list)).to_dict('records')

[{'key': 'a--b', 'code': '123', 'codes': ['123', '098']},
 {'key': 'c--d', 'code': '456', 'codes': ['000', '999']},
 {'key': 'e--f', 'code': '789', 'codes': ['789', '432']}]

I ended up creating a list of the numpy-typed column names:

np_fields = ['codes']

and then I replaced each field in place in my dataframe:

for col in np_fields:
    data[col] = data[col].map(np.ndarray.tolist)

I then called data.to_dict('records') once that was complete.

Actually we may first use the to_json() and then use json.loads to turn it into a dictionary.

import json
data_dict = json.loads(data.to_json(orient='records'))

output:

[{'key': 'a--b', 'code': '123', 'codes': ['123', '098']}, 
 {'key': 'c--d', 'code': '456', 'codes': ['000', '999']}, 
 {'key': 'e--f', 'code': '789', 'codes': ['789', '432']}]

Or you could first turn your data to list and then do to_dict():

data['codes'] = data['codes'].apply(lambda x:x.tolist())
output = data.to_dict('records')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM