How to create a new DataFrame where each column represents occurrence of an instance in a row of a previous DataFrame

Question

Lets say I have a DataFrame:

---------------------------- 
  | col1   | col2   | col3   | col4
----------------------------
1 | red    | green  | blue   | yellow 
2 | orange | purple | green  | NaN
3 | pink   | red    | blue   | green
4 | orange | pink   | purple | grey
5 | grey   | red    | NaN    | NaN

I want to create a new DataFrame which sets each possible instance as a new column and gives a 1 if it occurs in the row or a 0 if it doesn't:

  | red | green | blue | yellow | orange | purple | pink | grey
---------------------------------------------------------------
1 | 1   | 1     | 1    | 1      | 0      | 0      | 0    | 0 
2 | 0   | 1     | 0    | 0      | 1      | 1      | 0    | 0 
3 | 1   | 1     | 1    | 0      | 0      | 0      | 1    | 0 
4 | 0   | 0     | 0    | 0      | 1      | 1      | 1    | 1 
5 | 1   | 0     | 0    | 0      | 0      | 0      | 0    | 1

How could I go about achieving this?

Answer 1

Use get_dummies with max for always 0,1 values or is possible use sum for count 1 :

df = pd.get_dummies(df, prefix='', prefix_sep='').max(level=0, axis=1)
print (df)
   grey  orange  pink  red  green  purple  blue  yellow
1     0       0     0    1      1       0     1       1
2     0       1     0    0      1       1     0       0
3     0       0     1    1      1       0     1       0
4     1       1     1    0      0       1     0       0
5     1       0     0    1      0       0     0       0

How to create a new DataFrame where each column represents occurrence of an instance in a row of a previous DataFrame

Question

1 answers

solution1
0 ACCPTED 2020-05-08 10:58:13

How to create a new DataFrame where each column represents occurrence of an instance in a row of a previous DataFrame

Question

1 answers

solution1 0 ACCPTED 2020-05-08 10:58:13

solution1
0 ACCPTED 2020-05-08 10:58:13