I have a dataframe where one of the columns contains a list. I want to break up these lists so that each element has its own row.
Ex df:
index Name Color
1 Ford ['Red,Blue' , 'Red,Blue']
result df:
index Name Color
1 Ford Red
2 Ford Blue
3 Ford Red
4 Ford Blue
The code that I tried:
s = df['Color'].str.split(',').apply(Series,1).stack()
s.index = s.index.droplevel(-1)
s.name = 'Color'
del df['Color']
df = df.join(s)
Figured it out, answer below:
s = df.apply(lambda x: pd.Series(x['Color']),axis=1).stack.reset_index(level=1, drop=True)
s.Name = 'Color'
df = df.drop('Color', axis=1).join(s)
s = df['Color'].str.split(',').apply(Series,1).stack()
s.index = s.index.droplevel(-1)
s.name = 'Color'
del df['Color']
df = df.join(s)
Using apply
on big data set is really slow. I come up with solution without using apply
as follows: set_index
on columns index
and Name
. Next, join
and split
on Color
. Finally, create new datafrom from color list, then stack
and reset_index
and drop
unwanted columns.
Using df
as follows:
In [2370]: df
Out[2370]:
index Name Color
0 1 Ford [Red,Blue, Red,Blue]
1 1 Chevy [Yellow,Blue, Yellow,Blue]
2 1 Tesla [White,Green, Red,Blue]
df.set_index(['index', 'Name'], inplace=True)
color_list = [','.join(st).split(',') for st in df.Color.tolist()]
pd.DataFrame(color_list, index=df.index).stack().reset_index(level=[1, 2]).drop('level_2', 1)
Out[2376]:
Name 0
index
1 Ford Red
1 Ford Blue
1 Ford Red
1 Ford Blue
1 Chevy Yellow
1 Chevy Blue
1 Chevy Yellow
1 Chevy Blue
1 Tesla White
1 Tesla Green
1 Tesla Red
1 Tesla Blue
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.