I have a following dataframe:
df = pd.DataFrame({'column': [[['a', 0], ['b', 1]], [['b', 2]], [['c', 1], ['b', 2]]]})
df
column
0 [[a, 0], [b, 1]]
1 [[b, 2]]
2 [[c, 1], [b, 2]]
Where I don't know which letters exist and the amount of lists in each row varies. My goal is to get it looking like this:
a b c
0 0 1 NaN
1 NaN 2 NaN
2 NaN 2 1
A first step can be taken using:
df['column'].apply(pd.Series)
0 1
0 [a, 0] [b, 1]
1 [b, 2]
2 [c, 1] [b, 2]
However, a large part of this problem still stands.
Try this
df_final = pd.DataFrame(dict(l) for l in df.column)
Out[129]:
a b c
0 0.0 1 NaN
1 NaN 2 NaN
2 NaN 2 1.0
If you are on Pandas 0.25+, you can use explode
:
s = df['column'].explode()
(pd.DataFrame(list(s.values), index=s.index)
.set_index(0, append=True)[1]
.unstack()
)
Output:
0 a b c
0 0.0 1.0 NaN
1 NaN 2.0 NaN
2 NaN 2.0 1.0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.