简体   繁体   中英

Pandas DataFrame from Dictionary with Lists

I have an API that returns a single row of data as a Python dictionary. Most of the keys have a single value, but some of the keys have values that are lists (or even lists-of-lists or lists-of-dictionaries).

When I throw the dictionary into pd.DataFrame to try to convert it to a pandas DataFrame, it throws a "Arrays must be the same length" error. This is because it cannot process the keys which have multiple values (ie the keys which have values of lists).

How do I get pandas to treat the lists as 'single values'?

As a hypothetical example:

data = { 'building': 'White House', 'DC?': True,
         'occupants': ['Barack', 'Michelle', 'Sasha', 'Malia'] }

I want to turn it into a DataFrame like this:

ix   building         DC?      occupants
0    'White House'    True     ['Barack', 'Michelle', 'Sasha', 'Malia']

This works if you pass a list (of rows):

In [11]: pd.DataFrame(data)
Out[11]:
    DC?     building occupants
0  True  White House    Barack
1  True  White House  Michelle
2  True  White House     Sasha
3  True  White House     Malia

In [12]: pd.DataFrame([data])
Out[12]:
    DC?     building                         occupants
0  True  White House  [Barack, Michelle, Sasha, Malia]

This turns out to be very trivial in the end

data = { 'building': 'White House', 'DC?': True, 'occupants': ['Barack', 'Michelle', 'Sasha', 'Malia'] }
df = pandas.DataFrame([data])
print df

Which results in:

    DC?     building                         occupants
0  True  White House  [Barack, Michelle, Sasha, Malia]

Would it be acceptable if instead of having one entry with a list of occupants, you had individual entries for each occupant? If so you could just do

n = len(data['occupants'])
for key, val in data.items():
    if key != 'occupants':
        data[key] = n*[val]

EDIT: Actually, I'm getting this behavior in pandas (ie just with pd.DataFrame(data) ) even without this pre-processing. What version are you using?

Solution to make dataframe from dictionary of lists where keys become a sorted index and column names are provided. Good for creating dataframes from scraped html tables.

d = { 'B':[10,11], 'A':[20,21] }
df = pd.DataFrame(d.values(),columns=['C1','C2'],index=d.keys()).sort_index()
df

    C1  C2
A   20  21
B   10  11

I had a closely related problem, but my data structure was a multi-level dictionary with lists in the second level dictionary:

result = {'hamster': {'confidence': 1, 'ids': ['id1', 'id2']},
          'zombie': {'confidence': 1, 'ids': ['id3']}}

When importing this with pd.DataFrame([result]) , I end up with columns named hamster and zombie . The (for me) correct import would be to have these as row titles, and confidence and ids as column titles. To achieve this, I used pd.DataFrame.from_dict :

In [42]: pd.DataFrame.from_dict(result, orient="index")
Out[42]:
         confidence         ids
hamster           1  [id1, id2]
zombie            1       [id3]

This works for me with python 3.8 + pandas 1.2.3.

如果您事先知道字典的键,为什么不先创建一个空数据框,然后继续添加行呢?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM