简体   繁体   中英

Create a pandas Dataframe from a nested dict with row indices as dict keys and a dict with different columns per key

I have a dict of the form:

    pd_dict = {'row_id_1': {'col_1': val1, 'col_2': val2},
               'row_id_2': {'col_1': val3, 'col_3': val4, 'col_4': val5}
               ...
              }

and I would like to turn this into a pandas DataFrame:

            col_1    col_2    col_3    col4    ...
row_id_1    val1     val2     NaN      NaN
row_id_2    val3     NaN      val4     val5
...

The number of columns per row differs. The same columns may or may not repeat on different rows. I'd like to merge all and fill in NaN values where appropriate.

I tried:

pd.DataFrame.from_dict(pd_dict, orient='index') 

...but that doesn't give the correct output.

I also tried creating one DataFrame per row and then concat-ing them like so:

frames = []
...
for k, cols in pd_dict.items():
    ...
    frames.append(pd.DataFrame.from_dict({k: list(cols.values())}, orient='index', columns=list(cols.keys())))
    ...
df = pd.concat(frames)

That works but it takes a very long time.

It's worth mentioning that my data has around 1000 rows and 1000 columns per row so performance might become an issue. Thanks in advance!

这是由于不均匀lendict

pd.Series(pd_dict).apply(pd.Series)

You can do the following:

df = pd.DataFrame(pd_dict).T
print(df)
#         col_1 col_2 col_3 col_4
#row_id_1  val1  val2   NaN   NaN
#row_id_2  val3   NaN  val4  val5

Also your original attempt would work if you sorted:

print(pd.DataFrame.from_dict(pd_dict,orient='index').sort_index(1))
#         col_1 col_2 col_3 col_4
#row_id_1  val1  val2   NaN   NaN
#row_id_2  val3   NaN  val4  val5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM