[英]Pivot table based on groupby in Pandas
I have a dataframe like this:我有一个这样的数据框:
customer_id | date | category
1 | 2017-2-1 | toys
2 | 2017-2-1 | food
1 | 2017-2-1 | drinks
3 | 2017-2-2 | computer
2 | 2017-2-1 | toys
1 | 2017-3-1 | food
>>> import pandas as pd
>>> dt = dict(customer_id=[1,2,1,3,2,1],
date='2017-2-1 2017-2-1 2017-2-1 2017-2-2 2017-2-1 2017-3-1'.split(),
category=["toys", "food", "drinks", "computer", "toys", "food"]))
>>> df = pd.DataFrame(dt)
ues my new columns and one hot encoding those columns, I know I can use df.pivot_table(index = ['customer_id'], columns = ['category'])
.使用我的新列和一个热编码这些列,我知道我可以使用df.pivot_table(index = ['customer_id'], columns = ['category'])
。
>>> df['Indicator'] = 1
>>> df.pivot_table(index=['customer_id'], columns=['category'],
values='Indicator').fillna(0).astype(int)
category computer drinks food toys
customer_id
1 0 1 1 1
2 0 0 1 1
3 1 0 0 0
>>>
I also want to group by date
so each row only contains information from the same date, like in the desired output below, id 1 has two rows because two unique dates in the date
column.我还想按date
分组,所以每一行只包含来自同一日期的信息,就像下面所需的输出一样,id 1 有两行,因为date
列中有两个唯一的date
。
customer_id | toys | food | drinks | computer
1 | 1 | 0 | 1 | 0
1 | 0 | 1 | 0 | 0
2 | 1 | 1 | 0 | 0
3 | 0 | 0 | 0 | 1
You may looking for crosstab
您可能正在寻找crosstab
>>> pd.crosstab([df.customer_id,df.date], df.category)
category computer drinks food toys
customer_id date
1 2017-2-1 0 1 0 1
2017-3-1 0 0 1 0
2 2017-2-1 0 0 1 1
3 2017-2-2 1 0 0 0
>>>
>>> pd.crosstab([df.customer_id,df.date],
df.category).reset_index(level=1)
category date computer drinks food toys
customer_id
1 2017-2-1 0 1 0 1
1 2017-3-1 0 0 1 0
2 2017-2-1 0 0 1 1
3 2017-2-2 1 0 0 0
>>>
>>> pd.crosstab([df.customer_id, df.date],
df.category).reset_index(level=1, drop=True)
category computer drinks food toys
customer_id
1 0 1 0 1
1 0 0 1 0
2 0 0 1 1
3 1 0 0 0
>>>
Assuming your frame is called df
, you could add an indicator column and then directly use .pivot_table
:假设您的框架名为df
,您可以添加一个指标列,然后直接使用.pivot_table
:
df['Indicator'] = 1
pvt = df.pivot_table(index=['date', 'customer_id'],
columns='category',
values='Indicator')\
.fillna(0)
This gives a dataframe that looks like:这给出了一个如下所示的数据框:
category computer drinks food toys
date customer_id
2017-2-1 1 0.0 1.0 0.0 1.0
2 0.0 0.0 1.0 1.0
2017-2-2 3 1.0 0.0 0.0 0.0
2017-3-1 1 0.0 0.0 1.0 0.0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.