[英]Pandas DataFrame from Dictionary with Lists
I have an API that returns a single row of data as a Python dictionary.我有一个 API,它将单行数据作为 Python 字典返回。 Most of the keys have a single value, but some of the keys have values that are lists (or even lists-of-lists or lists-of-dictionaries).
大多数键只有一个值,但有些键的值是列表(甚至是列表列表或字典列表)。
When I throw the dictionary into pd.DataFrame to try to convert it to a pandas DataFrame, it throws a "Arrays must be the same length" error.当我将字典放入 pd.DataFrame 以尝试将其转换为 Pandas DataFrame 时,它会抛出“数组必须具有相同长度”错误。 This is because it cannot process the keys which have multiple values (ie the keys which have values of lists).
这是因为它无法处理具有多个值的键(即具有列表值的键)。
How do I get pandas to treat the lists as 'single values'?如何让熊猫将列表视为“单个值”?
As a hypothetical example:作为一个假设的例子:
data = { 'building': 'White House', 'DC?': True,
'occupants': ['Barack', 'Michelle', 'Sasha', 'Malia'] }
I want to turn it into a DataFrame like this:我想把它变成这样的 DataFrame:
ix building DC? occupants
0 'White House' True ['Barack', 'Michelle', 'Sasha', 'Malia']
This works if you pass a list (of rows):如果您传递一个列表(行),这会起作用:
In [11]: pd.DataFrame(data)
Out[11]:
DC? building occupants
0 True White House Barack
1 True White House Michelle
2 True White House Sasha
3 True White House Malia
In [12]: pd.DataFrame([data])
Out[12]:
DC? building occupants
0 True White House [Barack, Michelle, Sasha, Malia]
This turns out to be very trivial in the end这最终证明是非常微不足道的
data = { 'building': 'White House', 'DC?': True, 'occupants': ['Barack', 'Michelle', 'Sasha', 'Malia'] }
df = pandas.DataFrame([data])
print df
Which results in:结果是:
DC? building occupants
0 True White House [Barack, Michelle, Sasha, Malia]
Would it be acceptable if instead of having one entry with a list of occupants, you had individual entries for each occupant?如果您没有一个包含住户列表的条目,而是为每个住户设置单独的条目,这是否可以接受? If so you could just do
如果是这样,你可以这样做
n = len(data['occupants'])
for key, val in data.items():
if key != 'occupants':
data[key] = n*[val]
EDIT: Actually, I'm getting this behavior in pandas (ie just with pd.DataFrame(data)
) even without this pre-processing.编辑:实际上,即使没有这种预处理,我
pd.DataFrame(data)
在熊猫中得到这种行为(即仅使用pd.DataFrame(data)
)。 What version are you using?你用的是什么版本?
Solution to make dataframe from dictionary of lists where keys become a sorted index and column names are provided.从列表字典制作数据框的解决方案,其中键成为排序索引并提供列名。 Good for creating dataframes from scraped html tables.
适合从抓取的 html 表创建数据帧。
d = { 'B':[10,11], 'A':[20,21] }
df = pd.DataFrame(d.values(),columns=['C1','C2'],index=d.keys()).sort_index()
df
C1 C2
A 20 21
B 10 11
I had a closely related problem, but my data structure was a multi-level dictionary with lists in the second level dictionary:我有一个密切相关的问题,但我的数据结构是一个多级字典,在二级字典中有列表:
result = {'hamster': {'confidence': 1, 'ids': ['id1', 'id2']},
'zombie': {'confidence': 1, 'ids': ['id3']}}
When importing this with pd.DataFrame([result])
, I end up with columns named hamster
and zombie
.当使用
pd.DataFrame([result])
导入它时,我最终得到名为hamster
和zombie
列。 The (for me) correct import would be to have these as row titles, and confidence
and ids
as column titles. (对我而言)正确的导入是将这些作为行标题,将
confidence
和ids
作为列标题。 To achieve this, I used pd.DataFrame.from_dict
:为了实现这一点,我使用了
pd.DataFrame.from_dict
:
In [42]: pd.DataFrame.from_dict(result, orient="index")
Out[42]:
confidence ids
hamster 1 [id1, id2]
zombie 1 [id3]
This works for me with python 3.8 + pandas 1.2.3.这适用于 python 3.8 + pandas 1.2.3。
如果您事先知道字典的键,为什么不先创建一个空数据框,然后继续添加行呢?
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.