简体   繁体   中英

Pandas split column into multiple

Didn't find the solution to solve my problem.

In my dataset I have a column with weather events features. I need to convert it to multiple numeric columns-indicators. I search for quick solution

weather = pd.read_csv("weather.csv", parse_dates=[0])

Events column looks like:

id                    Events
0                       Rain
...
1                       Rain
...
8                   Fog-Rain
9                  Rain-Snow

I need to convert it to 4 features:

events = ['Rain','Snow','Fog','Thunderstorm']

Each can take 2 values - 1 or 0.

How can I do it with pandas?

str.get_dummies handles this very cleanly:

import pandas as pd

events_list = ['Rain', 'Rain', 'Fog-Rain', 'Rain-Snow', 'Thunderstorm', 'Fog-Thunderstorm']

weather_df = pd.DataFrame(events_list, columns=['Events'])

print(weather_df)

output:

             Events
0              Rain
1              Rain
2          Fog-Rain
3         Rain-Snow
4      Thunderstorm
5  Fog-Thunderstorm

We use str.get_dummies and join it to the original dataframe:

weather_df = pd.concat([weather_df, weather_df.Events.str.get_dummies(sep='-')], axis=1)
print(weather_df)

output:

             Events  Fog  Rain  Snow  Thunderstorm
0              Rain    0     1     0             0
1              Rain    0     1     0             0
2          Fog-Rain    1     1     0             0
3         Rain-Snow    0     1     1             0
4      Thunderstorm    0     0     0             1
5  Fog-Thunderstorm    1     0     0             1

You can easily drop the original column if you wish.

Since, Events have partial words you cannot use get_dummes if you use it will create a column for all possible combinations. Use str.contains() to find match and create columns.

I used 0 for true and -1 for false, but you could interchange that

df
Out[48]: 
   id        Events
0   0          Rain
1   1          Rain
2   8      Fog-Rain
3   9     Rain-Snow
4  32  Thunderstorm
5  31           Fog
6  23          Snow

df.Events.str.contains("Rain")
Out[49]: 
0     True
1     True
2     True
3     True
4    False
5    False
6    False
Name: Events, dtype: bool

df.loc[df.Events.str.contains("Rain"), "Rain"] = 0

df
Out[51]: 
   id        Events  Rain
0   0          Rain     0
1   1          Rain     0
2   8      Fog-Rain     0
3   9     Rain-Snow     0
4  32  Thunderstorm   NaN
5  31           Fog   NaN
6  23          Snow   NaN

df.loc[df.Events.str.contains("Snow"), "Snow"] = 0

df
Out[53]: 
   id        Events  Rain  Snow
0   0          Rain     0   NaN
1   1          Rain     0   NaN
2   8      Fog-Rain     0   NaN
3   9     Rain-Snow     0     0
4  32  Thunderstorm   NaN   NaN
5  31           Fog   NaN   NaN
6  23          Snow   NaN     0

df.loc[df.Events.str.contains("Thunderstorm"), "Thunderstorm"] = 0

df
Out[55]: 
   id        Events  Rain  Snow  Thunderstorm
0   0          Rain     0   NaN           NaN
1   1          Rain     0   NaN           NaN
2   8      Fog-Rain     0   NaN           NaN
3   9     Rain-Snow     0     0           NaN
4  32  Thunderstorm   NaN   NaN             0
5  31           Fog   NaN   NaN           NaN
6  23          Snow   NaN     0           NaN

df.loc[df.Events.str.contains("Fog"), "Fog"] = 0

df
Out[57]: 
   id        Events  Rain  Snow  Thunderstorm  Fog
0   0          Rain     0   NaN           NaN  NaN
1   1          Rain     0   NaN           NaN  NaN
2   8      Fog-Rain     0   NaN           NaN    0
3   9     Rain-Snow     0     0           NaN  NaN
4  32  Thunderstorm   NaN   NaN             0  NaN
5  31           Fog   NaN   NaN           NaN    0
6  23          Snow   NaN     0           NaN  NaN

df = df.fillna(-1)

df
Out[59]: 
   id        Events  Rain  Snow  Thunderstorm  Fog
0   0          Rain     0    -1            -1   -1
1   1          Rain     0    -1            -1   -1
2   8      Fog-Rain     0    -1            -1    0
3   9     Rain-Snow     0     0            -1   -1
4  32  Thunderstorm    -1    -1             0   -1
5  31           Fog    -1    -1            -1    0
6  23          Snow    -1     0            -1   -1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM