Lets say I have a DataFrame:
----------------------------
| col1 | col2 | col3 | col4
----------------------------
1 | red | green | blue | yellow
2 | orange | purple | green | NaN
3 | pink | red | blue | green
4 | orange | pink | purple | grey
5 | grey | red | NaN | NaN
I want to create a new DataFrame which sets each possible instance as a new column and gives a 1 if it occurs in the row or a 0 if it doesn't:
| red | green | blue | yellow | orange | purple | pink | grey
---------------------------------------------------------------
1 | 1 | 1 | 1 | 1 | 0 | 0 | 0 | 0
2 | 0 | 1 | 0 | 0 | 1 | 1 | 0 | 0
3 | 1 | 1 | 1 | 0 | 0 | 0 | 1 | 0
4 | 0 | 0 | 0 | 0 | 1 | 1 | 1 | 1
5 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 1
How could I go about achieving this?
Use get_dummies
with max
for always 0,1
values or is possible use sum
for count 1
:
df = pd.get_dummies(df, prefix='', prefix_sep='').max(level=0, axis=1)
print (df)
grey orange pink red green purple blue yellow
1 0 0 0 1 1 0 1 1
2 0 1 0 0 1 1 0 0
3 0 0 1 1 1 0 1 0
4 1 1 1 0 0 1 0 0
5 1 0 0 1 0 0 0 0
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.