[英]Groupby Diff - Pandas
I would like to find the difference between columns in Multi-index, I have three dimensions, the Family, the Date, and the Client, in the goal is to have new columns with the difference by rows with Client,Date and Family in Multi-index.我想找到多索引中列之间的区别,我有三个维度,家庭、日期和客户,目标是在 Multi 中拥有与客户、日期和家庭不同的行的新列-指数。
import pandas as pd
import numpy as np
data = {
'Family':{
0: 'Hugo',
1: 'Hugo',
2: 'Hugo',
3: 'Hugo'},
'Date': {
0: '2021-04-15',
1: '2021-04-16',
2: '2021-04-15',
3: '2021-04-16'},
'Client': {
0: 1,
1: 1,
2: 2,
3: 2},
'Code_Client': {
0: 605478.0,
1: 605478.0,
2: 605478.0,
3: 605478.0},
'Price': {
0: 2.23354416539888,
1: 2.0872536032616744,
2: 1.8426286431701764,
3: 0.3225935619590472}
}
df = pd.DataFrame(data)
pd.pivot_table(pd.DataFrame(data), values='Price', index=['Code_Client'],columns=
['Family','Date', 'Client'])
Do you have any idea?你有什么主意吗?
Thank you,谢谢,
I assume that you are looking for the difference for Price grouped by the Family
and Date
and Client
.我假设您正在寻找按
Family
和Date
and Client
分组的 Price 的差异。 Your formulation of the problem was somewhat unclear and you didn't post an expected output.您对问题的表述有些不清楚,并且您没有发布预期的 output。 I changed your dataframe slightly to add a family to make the solution more visible.
我稍微更改了您的 dataframe 以添加一个系列以使解决方案更加明显。
data = {
'Family':{
0: 'Hugo',
1: 'Hugo',
2: 'Victor',
3: 'Victor'},
'Date': {
0: '2021-04-15',
1: '2021-04-16',
2: '2021-04-15',
3: '2021-04-16'},
'Client': {
0: 1,
1: 1,
2: 2,
3: 2},
'Code_Client': {
0: 605478.0,
1: 605478.0,
2: 605478.0,
3: 605478.0},
'Price': {
0: 2.23354416539888,
1: 2.0872536032616744,
2: 1.8426286431701764,
3: 0.3225935619590472}
}
df = pd.DataFrame(data)
pd.pivot_table(pd.DataFrame(data), values='Price', index=['Code_Client'],columns=
['Family','Date', 'Client'])
As you can see, I added the Victor family.如您所见,我添加了 Victor 家族。 So, you dataframe looks like this:
所以,你的 dataframe 看起来像这样:
Family Date Client Code_Client Price
0 Hugo 2021-04-15 1 605478.0 2.233544
1 Hugo 2021-04-16 1 605478.0 2.087254
2 Victor 2021-04-15 2 605478.0 1.842629
3 Victor 2021-04-16 2 605478.0 0.322594
To add a column of differences by groups, I suggest you do the following:要按组添加差异列,我建议您执行以下操作:
df = df.set_index(['Family', 'Date','Client']).sort_index()[['Price']]
df['diff'] = np.nan
idx = pd.IndexSlice
for ix in df.index.levels[0]:
df.loc[ idx[ix,:], 'diff'] = df.loc[idx[ix,:], 'Price' ].diff()
The first step indexes you variables (the ones you want to group by) and create an empty (or filled with nan
) column of difference.第一步索引您的变量(您要分组的变量)并创建一个空的(或用
nan
填充的)差异列。 The second step populates it by the differences between rows, by groups.第二步通过行之间的差异,按组填充它。
This returns:这将返回:
Price diff
Family Date Client
Hugo 2021-04-15 1 2.233544 NaN
2021-04-16 1 2.087254 -0.146291
Victor 2021-04-15 2 1.842629 NaN
2021-04-16 2 0.322594 -1.520035
If you are unhappy about the nan
, do this:如果您对
nan
不满意,请执行以下操作:
df = df.set_index(['Family', 'Date','Client']).sort_index()[['Price']]
df['diff'] = np.nan
idx = pd.IndexSlice
for ix in df.index.levels[0]:
df.loc[ idx[ix,:], 'diff'] = df.loc[idx[ix,:], 'Price' ].diff().fillna(0)
I added .fillna(0)
to the diff()
statement.我将
.fillna(0)
添加到diff()
语句中。 It returns:它返回:
Price diff
Family Date Client
Hugo 2021-04-15 1 2.233544 0.000000
2021-04-16 1 2.087254 -0.146291
Victor 2021-04-15 2 1.842629 0.000000
2021-04-16 2 0.322594 -1.520035
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.