简体   繁体   中英

Pandas dataframe column with lists of lists of varying lengths to different columns

I have a following dataframe:

df = pd.DataFrame({'column': [[['a', 0], ['b', 1]], [['b', 2]], [['c', 1], ['b', 2]]]})
df
    column
0   [[a, 0], [b, 1]]
1   [[b, 2]]
2   [[c, 1], [b, 2]]

Where I don't know which letters exist and the amount of lists in each row varies. My goal is to get it looking like this:

     a    b    c
0    0    1    NaN
1    NaN  2    NaN
2    NaN  2    1

A first step can be taken using:

df['column'].apply(pd.Series)
    0       1
0   [a, 0]  [b, 1]
1   [b, 2]
2   [c, 1]  [b, 2]

However, a large part of this problem still stands.

Try this

df_final = pd.DataFrame(dict(l) for l in df.column)

Out[129]:
     a  b    c
0  0.0  1  NaN
1  NaN  2  NaN
2  NaN  2  1.0

If you are on Pandas 0.25+, you can use explode :

s  = df['column'].explode()
(pd.DataFrame(list(s.values), index=s.index)
   .set_index(0, append=True)[1]
   .unstack()
)

Output:

0    a    b    c
0  0.0  1.0  NaN
1  NaN  2.0  NaN
2  NaN  2.0  1.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM