来自带有列表的字典的 Pandas DataFrame

Question

I have an API that returns a single row of data as a Python dictionary.我有一个 API，它将单行数据作为 Python 字典返回。 Most of the keys have a single value, but some of the keys have values that are lists (or even lists-of-lists or lists-of-dictionaries).大多数键只有一个值，但有些键的值是列表（甚至是列表列表或字典列表）。

When I throw the dictionary into pd.DataFrame to try to convert it to a pandas DataFrame, it throws a "Arrays must be the same length" error.当我将字典放入 pd.DataFrame 以尝试将其转换为 Pandas DataFrame 时，它会抛出“数组必须具有相同长度”错误。 This is because it cannot process the keys which have multiple values (ie the keys which have values of lists).这是因为它无法处理具有多个值的键（即具有列表值的键）。

How do I get pandas to treat the lists as 'single values'?如何让熊猫将列表视为“单个值”？

As a hypothetical example:作为一个假设的例子：

data = { 'building': 'White House', 'DC?': True,
         'occupants': ['Barack', 'Michelle', 'Sasha', 'Malia'] }

I want to turn it into a DataFrame like this:我想把它变成这样的 DataFrame：

ix   building         DC?      occupants
0    'White House'    True     ['Barack', 'Michelle', 'Sasha', 'Malia']

Answer 1

This works if you pass a list (of rows):如果您传递一个列表（行），这会起作用：

In [11]: pd.DataFrame(data)
Out[11]:
    DC?     building occupants
0  True  White House    Barack
1  True  White House  Michelle
2  True  White House     Sasha
3  True  White House     Malia

In [12]: pd.DataFrame([data])
Out[12]:
    DC?     building                         occupants
0  True  White House  [Barack, Michelle, Sasha, Malia]

Answer 2

This turns out to be very trivial in the end这最终证明是非常微不足道的

data = { 'building': 'White House', 'DC?': True, 'occupants': ['Barack', 'Michelle', 'Sasha', 'Malia'] }
df = pandas.DataFrame([data])
print df

Which results in:结果是：

    DC?     building                         occupants
0  True  White House  [Barack, Michelle, Sasha, Malia]

Answer 3

Would it be acceptable if instead of having one entry with a list of occupants, you had individual entries for each occupant?如果您没有一个包含住户列表的条目，而是为每个住户设置单独的条目，这是否可以接受？ If so you could just do如果是这样，你可以这样做

n = len(data['occupants'])
for key, val in data.items():
    if key != 'occupants':
        data[key] = n*[val]

EDIT: Actually, I'm getting this behavior in pandas (ie just with pd.DataFrame(data) ) even without this pre-processing.编辑：实际上，即使没有这种预处理，我pd.DataFrame(data)在熊猫中得到这种行为（即仅使用pd.DataFrame(data) ）。 What version are you using?你用的是什么版本？

Answer 4

Solution to make dataframe from dictionary of lists where keys become a sorted index and column names are provided.从列表字典制作数据框的解决方案，其中键成为排序索引并提供列名。 Good for creating dataframes from scraped html tables.适合从抓取的 html 表创建数据帧。

d = { 'B':[10,11], 'A':[20,21] }
df = pd.DataFrame(d.values(),columns=['C1','C2'],index=d.keys()).sort_index()
df

    C1  C2
A   20  21
B   10  11

Answer 5

I had a closely related problem, but my data structure was a multi-level dictionary with lists in the second level dictionary:我有一个密切相关的问题，但我的数据结构是一个多级字典，在二级字典中有列表：

result = {'hamster': {'confidence': 1, 'ids': ['id1', 'id2']},
          'zombie': {'confidence': 1, 'ids': ['id3']}}

When importing this with pd.DataFrame([result]) , I end up with columns named hamster and zombie .当使用pd.DataFrame([result])导入它时，我最终得到名为hamster和zombie列。 The (for me) correct import would be to have these as row titles, and confidence and ids as column titles. （对我而言）正确的导入是将这些作为行标题，将confidence和ids作为列标题。 To achieve this, I used pd.DataFrame.from_dict :为了实现这一点，我使用了pd.DataFrame.from_dict ：

In [42]: pd.DataFrame.from_dict(result, orient="index")
Out[42]:
         confidence         ids
hamster           1  [id1, id2]
zombie            1       [id3]

This works for me with python 3.8 + pandas 1.2.3.这适用于 python 3.8 + pandas 1.2.3。

Answer 6

如果您事先知道字典的键，为什么不先创建一个空数据框，然后继续添加行呢？

来自带有列表的字典的 Pandas DataFrame

问题描述

6 个解决方案

解决方案1
24 已采纳 2015-11-03 16:50:18

解决方案2
5 2015-11-03 17:41:17

解决方案3
1 2015-11-03 16:50:41

解决方案4
0 2021-05-09 16:26:35

解决方案5
0 2021-06-17 14:07:00

解决方案6
-1 2015-11-03 16:47:39

来自带有列表的字典的 Pandas DataFrame

问题描述

6 个解决方案

解决方案1 24 已采纳 2015-11-03 16:50:18

解决方案2 5 2015-11-03 17:41:17

解决方案3 1 2015-11-03 16:50:41

解决方案4 0 2021-05-09 16:26:35

解决方案5 0 2021-06-17 14:07:00

解决方案6 -1 2015-11-03 16:47:39

解决方案1
24 已采纳 2015-11-03 16:50:18

解决方案2
5 2015-11-03 17:41:17

解决方案3
1 2015-11-03 16:50:41

解决方案4
0 2021-05-09 16:26:35

解决方案5
0 2021-06-17 14:07:00

解决方案6
-1 2015-11-03 16:47:39