简体   繁体   中英

Encoding a single column into multiple columns with pandas

I have a column in a pandas dataframe for genres. It is a string with the genres seperated by a column.

>>> df['genres_omdb']
0                      Crime, Drama
1        Adventure, Family, Fantasy
2                    Drama, Mystery
3         Horror, Mystery, Thriller
5         Action, Adventure, Sci-Fi
6                    Drama, Romance
8                             Drama
9      Animation, Adventure, Comedy
10     Animation, Adventure, Comedy
11                    Drama, Sci-Fi
12                            Drama
13              Drama, Romance, War
14            Comedy, Drama, Family
16         Comedy, Musical, Romance

So originally I split it into three columns and ran get_dummies on each of the columns. This produced repetitive columns (ie genre1_Adventure genre2_Adventure).

So then I tried getting every unique genre, creating a column of that genre, and then manually iterating through the rows and changing values to a 1 if the genre is in the list.

genre1_keys = df['genre1'].value_counts().keys()
genre2_keys = df['genre2'].value_counts().keys()
genre3_keys = df['genre3'].value_counts().keys()
for genre in genre1_keys:
  all_genres.add(genre.strip())
for genre in genre2_keys:
  all_genres.add(genre.strip())
for genre in genre3_keys:
  all_genres.add(genre.strip())
for genre in all_genres:
  df[genre] = 0
for i, row in df.iterrows():
  genres = row['genres_omdb'].split(',')
  for genre in genres:
    genre = genre.strip()
    row[genre] = 1

It's very messy and I know there is a better way to do this. Any help on how to clean up this code would be appreciated.

I think you just need to str.get_dummies

df['genres_omdb'].str.get_dummies(sep=',')
Out[115]: 
    Action  Adventure  Animation  Comedy  Crime  Drama  Family  Fantasy  \
0        0          0          0       0      1      1       0        0   
1        0          1          0       0      0      0       1        1   
2        0          0          0       0      0      1       0        0   
3        0          0          0       0      0      0       0        0   
5        1          1          0       0      0      0       0        0   
6        0          0          0       0      0      1       0        0   
8        0          0          0       0      0      1       0        0   
9        0          1          1       1      0      0       0        0   
10       0          1          1       1      0      0       0        0   
11       0          0          0       0      0      1       0        0   
12       0          0          0       0      0      1       0        0   
13       0          0          0       0      0      1       0        0   
14       0          0          0       1      0      1       1        0   
16       0          0          0       1      0      0       0        0   
    Horror  Musical  Mystery  Romance  Sci-Fi  Thriller  War  
0        0        0        0        0       0         0    0  
1        0        0        0        0       0         0    0  
2        0        0        1        0       0         0    0  
3        1        0        1        0       0         1    0  
5        0        0        0        0       1         0    0  
6        0        0        0        1       0         0    0  
8        0        0        0        0       0         0    0  
9        0        0        0        0       0         0    0  
10       0        0        0        0       0         0    0  
11       0        0        0        0       1         0    0  
12       0        0        0        0       0         0    0  
13       0        0        0        1       0         0    1  
14       0        0        0        0       0         0    0  
16       0        1        0        1       0         0    0  

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM