简体   繁体   中英

Pivot table with list entries pandas data frame

I have a data frame that has entries that look like this:

customer_id    products_purchased
1              A,B,D,Q
2              B,K,T
3              A
4              M,H,U,R,T,Z
1              A,U,C
3              P,T
.
.
.

I would like to produce a pivot table that has the customer_id and then a column for each product and a count (0, if the customer never purchased the product). For the example above:

customer_id    A     B     C     D     H     K     M     P     Q     R     T     U     Z
1              2     1     1     1     0     0     0     0     1     0     0     1     0
2              0     1     0     0     0     1     0     0     0     0     1     0     0
3              1     0     0     0     0     0     0     1     0     0     1     0     0
4              0     0     0     0     1     0     1     0     0     1     1     1     0

There is also a datetime column to indicate when the purchase was made, but it is not important to this particular problem.

This is str.get_dummies then groupby:

(df['products_purchased'].str.get_dummies(',')
   .groupby(df['customer_id']).sum()
   .reset_index()
)

Output:

   customer_id  A  B  C  D  H  K  M  P  Q  R  T  U  Z
0            1  2  1  1  1  0  0  0  0  1  0  0  1  0
1            2  0  1  0  0  0  1  0  0  0  0  1  0  0
2            3  1  0  0  0  0  0  0  1  0  0  1  0  0
3            4  0  0  0  0  1  0  1  0  0  1  1  1  1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM