简体   繁体   English

来自带有列表的字典的 Pandas DataFrame

[英]Pandas DataFrame from Dictionary with Lists

I have an API that returns a single row of data as a Python dictionary.我有一个 API,它将单行数据作为 Python 字典返回。 Most of the keys have a single value, but some of the keys have values that are lists (or even lists-of-lists or lists-of-dictionaries).大多数键只有一个值,但有些键的值是列表(甚至是列表列表或字典列表)。

When I throw the dictionary into pd.DataFrame to try to convert it to a pandas DataFrame, it throws a "Arrays must be the same length" error.当我将字典放入 pd.DataFrame 以尝试将其转换为 Pandas DataFrame 时,它​​会抛出“数组必须具有相同长度”错误。 This is because it cannot process the keys which have multiple values (ie the keys which have values of lists).这是因为它无法处理具有多个值的键(即具有列表值的键)。

How do I get pandas to treat the lists as 'single values'?如何让熊猫将列表视为“单个值”?

As a hypothetical example:作为一个假设的例子:

data = { 'building': 'White House', 'DC?': True,
         'occupants': ['Barack', 'Michelle', 'Sasha', 'Malia'] }

I want to turn it into a DataFrame like this:我想把它变成这样的 DataFrame:

ix   building         DC?      occupants
0    'White House'    True     ['Barack', 'Michelle', 'Sasha', 'Malia']

This works if you pass a list (of rows):如果您传递一个列表(行),这会起作用:

In [11]: pd.DataFrame(data)
Out[11]:
    DC?     building occupants
0  True  White House    Barack
1  True  White House  Michelle
2  True  White House     Sasha
3  True  White House     Malia

In [12]: pd.DataFrame([data])
Out[12]:
    DC?     building                         occupants
0  True  White House  [Barack, Michelle, Sasha, Malia]

This turns out to be very trivial in the end这最终证明是非常微不足道的

data = { 'building': 'White House', 'DC?': True, 'occupants': ['Barack', 'Michelle', 'Sasha', 'Malia'] }
df = pandas.DataFrame([data])
print df

Which results in:结果是:

    DC?     building                         occupants
0  True  White House  [Barack, Michelle, Sasha, Malia]

Would it be acceptable if instead of having one entry with a list of occupants, you had individual entries for each occupant?如果您没有一个包含住户列表的条目,而是为每个住户设置单独的条目,这是否可以接受? If so you could just do如果是这样,你可以这样做

n = len(data['occupants'])
for key, val in data.items():
    if key != 'occupants':
        data[key] = n*[val]

EDIT: Actually, I'm getting this behavior in pandas (ie just with pd.DataFrame(data) ) even without this pre-processing.编辑:实际上,即使没有这种预处理,我pd.DataFrame(data)在熊猫中得到这种行为(即仅使用pd.DataFrame(data) )。 What version are you using?你用的是什么版本?

Solution to make dataframe from dictionary of lists where keys become a sorted index and column names are provided.从列表字典制作数据框的解决方案,其中键成为排序索引并提供列名。 Good for creating dataframes from scraped html tables.适合从抓取的 html 表创建数据帧。

d = { 'B':[10,11], 'A':[20,21] }
df = pd.DataFrame(d.values(),columns=['C1','C2'],index=d.keys()).sort_index()
df

    C1  C2
A   20  21
B   10  11

I had a closely related problem, but my data structure was a multi-level dictionary with lists in the second level dictionary:我有一个密切相关的问题,但我的数据结构是一个多级字典,在二级字典中有列表:

result = {'hamster': {'confidence': 1, 'ids': ['id1', 'id2']},
          'zombie': {'confidence': 1, 'ids': ['id3']}}

When importing this with pd.DataFrame([result]) , I end up with columns named hamster and zombie .当使用pd.DataFrame([result])导入它时,我最终得到名为hamsterzombie列。 The (for me) correct import would be to have these as row titles, and confidence and ids as column titles. (对我而言)正确的导入是将这些作为行标题,将confidenceids作为列标题。 To achieve this, I used pd.DataFrame.from_dict :为了实现这一点,我使用了pd.DataFrame.from_dict

In [42]: pd.DataFrame.from_dict(result, orient="index")
Out[42]:
         confidence         ids
hamster           1  [id1, id2]
zombie            1       [id3]

This works for me with python 3.8 + pandas 1.2.3.这适用于 python 3.8 + pandas 1.2.3。

如果您事先知道字典的键,为什么不先创建一个空数据框,然后继续添加行呢?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM