I have a column like this:
Genre
Action|Crime|Drama|Thriller
Action|Crime|Thriller
Drama|Thriller
Crime|Drama
Horror|Thriller
Crime|Drama|Mystery|Thriller
Documentary
Comedy|Crime
Action|Adventure|Sci-Fi
.....
so on.
what i want is output like multiple columns:
it generate various column of genre eg:
action scifi crime adventure . . . . .
0 1 0 1 0
1 0 0 0 0
Use .str.split
, stack
, and get_dummies
:
df['Genre'].str.split('|',expand=True).stack().str.get_dummies().sum(level=0)
Output:
Action Adventure Comedy Crime Documentary Drama Horror Mystery \
0 1 0 0 1 0 1 0 0
1 1 0 0 1 0 0 0 0
2 0 0 0 0 0 1 0 0
3 0 0 0 1 0 1 0 0
4 0 0 0 0 0 0 1 0
5 0 0 0 1 0 1 0 1
6 0 0 0 0 1 0 0 0
7 0 0 1 1 0 0 0 0
8 1 1 0 0 0 0 0 0
Sci-Fi Thriller
0 0 1
1 0 1
2 0 1
3 0 0
4 0 1
5 0 1
6 0 0
7 0 0
8 1 0
First get that one column, then do .values[0]
on this column.
Secondly use the previously generated string, split it by | into a list.
Using df[df[list]]
should give you the response you want.
To conclude (for a single entry):
genres = list(df['Genre'].values[0].split('|'))
df[genres]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.