Pandas 中基于 groupby 的数据透视表

Question

I have a dataframe like this:我有一个这样的数据框：

customer_id | date     | category
1           | 2017-2-1 | toys
2           | 2017-2-1 | food
1           | 2017-2-1 | drinks
3           | 2017-2-2 | computer
2           | 2017-2-1 | toys
1           | 2017-3-1 | food

>>> import pandas as pd
>>> dt = dict(customer_id=[1,2,1,3,2,1],
              date='2017-2-1 2017-2-1 2017-2-1 2017-2-2 2017-2-1 2017-3-1'.split(),
              category=["toys", "food", "drinks", "computer", "toys", "food"])) 
>>> df = pd.DataFrame(dt)

ues my new columns and one hot encoding those columns, I know I can use df.pivot_table(index = ['customer_id'], columns = ['category']) .使用我的新列和一个热编码这些列，我知道我可以使用df.pivot_table(index = ['customer_id'], columns = ['category']) 。

>>> df['Indicator'] = 1 
>>> df.pivot_table(index=['customer_id'], columns=['category'],
                   values='Indicator').fillna(0).astype(int)                                                             
category     computer  drinks  food  toys
customer_id                              
1                   0       1     1     1
2                   0       0     1     1
3                   1       0     0     0
>>>

I also want to group by date so each row only contains information from the same date, like in the desired output below, id 1 has two rows because two unique dates in the date column.我还想按date分组，所以每一行只包含来自同一日期的信息，就像下面所需的输出一样，id 1 有两行，因为date列中有两个唯一的date 。

customer_id | toys | food | drinks | computer 
1           | 1    | 0    | 1      | 0        
1           | 0    | 1    | 0      | 0
2           | 1    | 1    | 0      | 0
3           | 0    | 0    | 0      | 1

Answer 1

You may looking for crosstab您可能正在寻找crosstab

>>> pd.crosstab([df.customer_id,df.date], df.category)                                                                                                                
category              computer  drinks  food  toys
customer_id date                                  
1           2017-2-1         0       1     0     1
            2017-3-1         0       0     1     0
2           2017-2-1         0       0     1     1
3           2017-2-2         1       0     0     0
>>>
>>> pd.crosstab([df.customer_id,df.date],
                df.category).reset_index(level=1)                                                                                           
category         date  computer  drinks  food  toys
customer_id                                        
1            2017-2-1         0       1     0     1
1            2017-3-1         0       0     1     0
2            2017-2-1         0       0     1     1
3            2017-2-2         1       0     0     0
>>>
>>> pd.crosstab([df.customer_id, df.date], 
                df.category).reset_index(level=1, drop=True)                                                                                
category     computer  drinks  food  toys
customer_id                              
1                   0       1     0     1
1                   0       0     1     0
2                   0       0     1     1
3                   1       0     0     0
>>>

Answer 2

Assuming your frame is called df , you could add an indicator column and then directly use .pivot_table :假设您的框架名为df ，您可以添加一个指标列，然后直接使用.pivot_table ：

df['Indicator'] = 1

pvt = df.pivot_table(index=['date', 'customer_id'],
                     columns='category',
                     values='Indicator')\
        .fillna(0)

This gives a dataframe that looks like:这给出了一个如下所示的数据框：

category              computer  drinks  food  toys
date     customer_id                              
2017-2-1 1                 0.0     1.0   0.0   1.0
         2                 0.0     0.0   1.0   1.0
2017-2-2 3                 1.0     0.0   0.0   0.0
2017-3-1 1                 0.0     0.0   1.0   0.0

Pandas 中基于 groupby 的数据透视表

问题描述

2 个解决方案

解决方案1
4 已采纳 2018-07-26 15:05:41

解决方案2
1 2018-07-26 15:13:34

Pandas 中基于 groupby 的数据透视表

问题描述

2 个解决方案

解决方案1 4 已采纳 2018-07-26 15:05:41

解决方案2 1 2018-07-26 15:13:34

解决方案1
4 已采纳 2018-07-26 15:05:41

解决方案2
1 2018-07-26 15:13:34