I want to transform a one-hot-encoded dataframe into a basket sparse matrix.
I have this:
df
Ticket Number Water Orange Lemon Strawb. Peach Book Pen
5001 0 0 0 0 1 1 0
5002 1 1 0 0 1 1 0
5003 1 0 0 0 0 0 0
I want this:
df
Ticket Number
5001 Peach, Book
5002 Water, Orange, Peach, Book
5003 Water
I have tried some of the ideas here:
Pivoting a One-Hot-Encode Dataframe
But I wasn't able to come up with a solution myself.
Some help would be very much appreciated. Thanks
You can use DataFrame.dot
after setting "Ticket Number" to be the index:
u = df.set_index('Ticket Number')
u.dot(u.columns+',').str.rstrip(',')
Ticket Number
5001 Peach,Book
5002 Water,Orange,Peach,Book
5003 Water
dtype: object
Or,
u.dot(u.columns+',').str[:-1].reset_index(name='Items')
Ticket Number Items
0 5001 Peach,Book
1 5002 Water,Orange,Peach,Book
2 5003 Water
A slightly more robust version of the same thing:
u = df.set_index('Ticket Number').select_dtypes([np.number])
u = u.fillna(0, downcast='infer').clip(lower=0, upper=1)
u.dot(u.columns+',').str[:-1]
Ticket Number
5001 Peach,Book
5002 Water,Orange,Peach,Book
5003 Water
dtype: object
A long way
df.melt('TicketNumber').loc[lambda x :x['value']==1].groupby('TicketNumber').variable.agg(','.join)
Out[746]:
TicketNumber
5001 Peach,Book
5002 Water,Orange,Peach,Book
5003 Water
Name: variable, dtype: object
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.