I have a data frame that has entries that look like this:
customer_id products_purchased
1 A,B,D,Q
2 B,K,T
3 A
4 M,H,U,R,T,Z
1 A,U,C
3 P,T
.
.
.
I would like to produce a pivot table that has the customer_id and then a column for each product and a count (0, if the customer never purchased the product). For the example above:
customer_id A B C D H K M P Q R T U Z
1 2 1 1 1 0 0 0 0 1 0 0 1 0
2 0 1 0 0 0 1 0 0 0 0 1 0 0
3 1 0 0 0 0 0 0 1 0 0 1 0 0
4 0 0 0 0 1 0 1 0 0 1 1 1 0
There is also a datetime column to indicate when the purchase was made, but it is not important to this particular problem.
This is str.get_dummies
then groupby:
(df['products_purchased'].str.get_dummies(',')
.groupby(df['customer_id']).sum()
.reset_index()
)
Output:
customer_id A B C D H K M P Q R T U Z
0 1 2 1 1 1 0 0 0 0 1 0 0 1 0
1 2 0 1 0 0 0 1 0 0 0 0 1 0 0
2 3 1 0 0 0 0 0 0 1 0 0 1 0 0
3 4 0 0 0 0 1 0 1 0 0 1 1 1 1
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.