简体   繁体   中英

How to transform a one hot encoded dataframe to a basket sparse matrix

I want to transform a one-hot-encoded dataframe into a basket sparse matrix.

I have this:

df
Ticket Number  Water  Orange  Lemon  Strawb.  Peach  Book  Pen
5001           0      0      0     0       1     1    0
5002           1      1      0     0       1     1    0
5003           1      0      0     0       0     0    0

I want this:

df
Ticket Number 
5001           Peach, Book
5002           Water, Orange, Peach, Book
5003           Water

I have tried some of the ideas here:

Pivoting a One-Hot-Encode Dataframe

But I wasn't able to come up with a solution myself.

Some help would be very much appreciated. Thanks

这就是数据框的实际外观

You can use DataFrame.dot after setting "Ticket Number" to be the index:

u = df.set_index('Ticket Number')
u.dot(u.columns+',').str.rstrip(',')

Ticket Number
5001                 Peach,Book
5002    Water,Orange,Peach,Book
5003                      Water
dtype: object

Or,

u.dot(u.columns+',').str[:-1].reset_index(name='Items')

   Ticket Number                    Items
0           5001               Peach,Book
1           5002  Water,Orange,Peach,Book
2           5003                    Water

A slightly more robust version of the same thing:

u = df.set_index('Ticket Number').select_dtypes([np.number])
u = u.fillna(0, downcast='infer').clip(lower=0, upper=1)
u.dot(u.columns+',').str[:-1]

Ticket Number
5001                 Peach,Book
5002    Water,Orange,Peach,Book
5003                      Water
dtype: object

A long way

df.melt('TicketNumber').loc[lambda x :x['value']==1].groupby('TicketNumber').variable.agg(','.join)
Out[746]: 
TicketNumber
5001                 Peach,Book
5002    Water,Orange,Peach,Book
5003                      Water
Name: variable, dtype: object

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM