简体   繁体   English

使用.to_dict时如何将numpy arrays的列转换为列表

[英]How to convert columns of numpy arrays to lists when using .to_dict

I would like to take my pandas Dataframe and convert it to a list of dictionaries.我想把我的 pandas Dataframe 转换成字典列表。 I can do this using the pandas to_dict('records') function.我可以使用 pandas to_dict('records') function 来做到这一点。 However, this function takes any column values that are lists and returns numpy arrays.但是,此 function 采用列表中的任何列值并返回 numpy arrays。 I would like for the content of the returned list of dictionaries to be base python objects rather than numpy arrays.我希望返回的字典列表的内容是基本 python 对象,而不是 numpy arrays。

I understand I could iterate my outputted dictionaries but I was wondering if there is something more clever to do this.我知道我可以迭代我输出的字典,但我想知道是否有更聪明的方法可以做到这一点。

Here is some sample code that shows my problem:这是一些显示我的问题的示例代码:

import pandas as pd
import numpy as np


data = pd.concat([
    pd.Series(['a--b', 'c--d', 'e--f'], name='key'),
    pd.Series(['123', '456', '789'], name='code'),
    pd.Series([np.array(['123', '098']), np.array(['000', '999']), np.array(['789', '432'])], name='codes')
    ], axis=1)

output = data.to_dict('records')

# this prints <class 'numpy.ndarray'>
print(type(output[0]['codes']))

output , in this case, looks like this: output ,在这种情况下,如下所示:

[{'key': 'a--b', 'code': '123', 'codes': array(['123', '098'], dtype='<U3')},
 {'key': 'c--d', 'code': '456', 'codes': array(['000', '999'], dtype='<U3')},
 {'key': 'e--f', 'code': '789', 'codes': array(['789', '432'], dtype='<U3')}]

I would like for that print statement to print a list.我希望该打印语句打印一个列表。 I understand I could simply do the following:我知道我可以简单地执行以下操作:

for row in output:
    row['codes'] = row['codes'].tolist()

# this now prints <class 'list'>, which is what I want
print(type(output[0]['codes']))

However, my dataframe is of course much more complicated than this, and I have multiple columns that are numpy arrays.但是,我的 dataframe 当然比这复杂得多,而且我有多个列是 numpy arrays。 I know I could expand the snippet above to check which columns are array type and cast them using tolist() , but I'm wondering if there is something snappier or more clever?我知道我可以扩展上面的代码片段来检查哪些列是数组类型并使用tolist()它们,但我想知道是否有更敏捷或更聪明的东西? Perhaps something provided by Pandas that is optimized?也许 Pandas 提供的东西经过优化?

To be clear, here is the output I need to have:需要明确的是,这是我需要的 output:

print(output)
[{'key': 'a--b', 'code': '123', 'codes': ['123', '098']},
 {'key': 'c--d', 'code': '456', 'codes': ['000', '999']},
 {'key': 'e--f', 'code': '789', 'codes': ['789', '432']}]

Let us first use applymap to convert numpy array's to python lists, then use to_dict让我们首先使用applymap将 numpy 数组转换为 python 列表,然后使用to_dict

cols = ['codes']
data.assign(**data[cols].applymap(list)).to_dict('records')

[{'key': 'a--b', 'code': '123', 'codes': ['123', '098']},
 {'key': 'c--d', 'code': '456', 'codes': ['000', '999']},
 {'key': 'e--f', 'code': '789', 'codes': ['789', '432']}]

I ended up creating a list of the numpy-typed column names:我最终创建了一个 numpy 类型的列名列表:

np_fields = ['codes']

and then I replaced each field in place in my dataframe:然后我替换了 dataframe 中的每个字段:

for col in np_fields:
    data[col] = data[col].map(np.ndarray.tolist)

I then called data.to_dict('records') once that was complete.然后,一旦完成,我就会调用data.to_dict('records')

Actually we may first use the to_json() and then use json.loads to turn it into a dictionary.其实我们可以先用to_json()再用json.loads把它变成字典。

import json
data_dict = json.loads(data.to_json(orient='records'))

output: output:

[{'key': 'a--b', 'code': '123', 'codes': ['123', '098']}, 
 {'key': 'c--d', 'code': '456', 'codes': ['000', '999']}, 
 {'key': 'e--f', 'code': '789', 'codes': ['789', '432']}]

Or you could first turn your data to list and then do to_dict():或者你可以先把你的数据变成列表,然后做 to_dict():

data['codes'] = data['codes'].apply(lambda x:x.tolist())
output = data.to_dict('records')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM