简体   繁体   中英

Dataframe from a dict of lists of dicts?

I have a dict of lists of dicts. What is the most efficient way to convert this into a DataFrame in pandas?

data = {
     "0a2":[{"a":1,"b":1},{"a":1,"b":1,"c":1},{"a":1,"b":1}],
     "279":[{"a":1,"b":1,"c":1},{"a":1,"b":1,"d":1}],
     "ae2":[{"a":1,"b":1},{"a":1,"d":1},{"a":1,"b":1},{"a":1,"d":1}], 
     #...
}
import pandas as pd
pd.DataFrame(data, columns=["a","b","c","d"])

数据

What I've tried:

One solution is to denormalize the data like this, by duplicating the "id" keys:

bad_data = [
      {"a":1,"b":1,"id":"0a2"},{"a":1,"b":1,"c":1,"id":"0a2"},{"a":1,"b":1,"id":"0a2"},
      {"a":1,"b":1,"c":1,"id":"279"},{"a":1,"b":1,"d":1,"id":"279"},
      {"a":1,"b":1,"id":"ae2"},{"a":1,"d":1,"id":"ae2"},{"a":1,"b":1,"id":"ae2"},{"a":1,"d":1,"id":"ae2"}
]
pd.DataFrame(bad_data, columns=["a","b","c","d","id"])

But my data is very large, so I'd prefer some other hierarchical index solution.

IIUC, you can do (remcomended)

new_df = pd.concat((pd.DataFrame(d) for d in data.values()), keys=data.keys())

Output:

       a    b    c    d
0a2 0  1  1.0  NaN  NaN
    1  1  1.0  1.0  NaN
    2  1  1.0  NaN  NaN
279 0  1  1.0  1.0  NaN
    1  1  1.0  NaN  1.0
ae2 0  1  1.0  NaN  NaN
    1  1  NaN  NaN  1.0
    2  1  1.0  NaN  NaN
    3  1  NaN  NaN  1.0

Or

pd.concat(pd.DataFrame(v).assign(ID=k) for k,v in data.items())

Output:

   a    b    c   ID    d
0  1  1.0  NaN  0a2  NaN
1  1  1.0  1.0  0a2  NaN
2  1  1.0  NaN  0a2  NaN
0  1  1.0  1.0  279  NaN
1  1  1.0  NaN  279  1.0
0  1  1.0  NaN  ae2  NaN
1  1  NaN  NaN  ae2  1.0
2  1  1.0  NaN  ae2  NaN
3  1  NaN  NaN  ae2  1.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM