Groupby Diff - Pandas

Question

I would like to find the difference between columns in Multi-index, I have three dimensions, the Family, the Date, and the Client, in the goal is to have new columns with the difference by rows with Client,Date and Family in Multi-index.我想找到多索引中列之间的区别，我有三个维度，家庭、日期和客户，目标是在 Multi 中拥有与客户、日期和家庭不同的行的新列-指数。

    import pandas as pd
    import numpy as np

    data = {
        'Family':{
            0: 'Hugo',
            1: 'Hugo', 
            2: 'Hugo', 
            3: 'Hugo'},
        'Date': {
            0: '2021-04-15',
            1: '2021-04-16',
            2: '2021-04-15',
            3: '2021-04-16'},
        'Client': {
            0: 1,
            1: 1,
            2: 2,
            3: 2},
        'Code_Client': {
            0: 605478.0,
            1: 605478.0,
            2: 605478.0,
            3: 605478.0},
        'Price': {
            0: 2.23354416539888,
            1: 2.0872536032616744,
            2: 1.8426286431701764,
            3: 0.3225935619590472}
        }

    df = pd.DataFrame(data)
    pd.pivot_table(pd.DataFrame(data), values='Price', index=['Code_Client'],columns= 
    ['Family','Date', 'Client'])

Do you have any idea?你有什么主意吗？

Thank you,谢谢，

Answer 1

I assume that you are looking for the difference for Price grouped by the Family and Date and Client .我假设您正在寻找按Family和Date and Client分组的 Price 的差异。 Your formulation of the problem was somewhat unclear and you didn't post an expected output.您对问题的表述有些不清楚，并且您没有发布预期的 output。 I changed your dataframe slightly to add a family to make the solution more visible.我稍微更改了您的 dataframe 以添加一个系列以使解决方案更加明显。

data = {
        'Family':{
            0: 'Hugo',
            1: 'Hugo', 
            2: 'Victor', 
            3: 'Victor'},
        'Date': {
            0: '2021-04-15',
            1: '2021-04-16',
            2: '2021-04-15',
            3: '2021-04-16'},
        'Client': {
            0: 1,
            1: 1,
            2: 2,
            3: 2},
        'Code_Client': {
            0: 605478.0,
            1: 605478.0,
            2: 605478.0,
            3: 605478.0},
        'Price': {
            0: 2.23354416539888,
            1: 2.0872536032616744,
            2: 1.8426286431701764,
            3: 0.3225935619590472}
        }

    df = pd.DataFrame(data)
    pd.pivot_table(pd.DataFrame(data), values='Price', index=['Code_Client'],columns= 
    ['Family','Date', 'Client'])

As you can see, I added the Victor family.如您所见，我添加了 Victor 家族。 So, you dataframe looks like this:所以，你的 dataframe 看起来像这样：

Family        Date  Client  Code_Client     Price
0    Hugo  2021-04-15       1     605478.0  2.233544
1    Hugo  2021-04-16       1     605478.0  2.087254
2  Victor  2021-04-15       2     605478.0  1.842629
3  Victor  2021-04-16       2     605478.0  0.322594

To add a column of differences by groups, I suggest you do the following:要按组添加差异列，我建议您执行以下操作：

df =  df.set_index(['Family', 'Date','Client']).sort_index()[['Price']]
df['diff'] = np.nan
idx = pd.IndexSlice

for ix in df.index.levels[0]:
    df.loc[ idx[ix,:], 'diff'] = df.loc[idx[ix,:], 'Price' ].diff()

The first step indexes you variables (the ones you want to group by) and create an empty (or filled with nan ) column of difference.第一步索引您的变量（您要分组的变量）并创建一个空的（或用nan填充的）差异列。 The second step populates it by the differences between rows, by groups.第二步通过行之间的差异，按组填充它。

This returns:这将返回：

                       Price      diff
Family Date       Client                    
Hugo   2021-04-15 1       2.233544       NaN
       2021-04-16 1       2.087254 -0.146291
Victor 2021-04-15 2       1.842629       NaN
       2021-04-16 2       0.322594 -1.520035

If you are unhappy about the nan , do this:如果您对nan不满意，请执行以下操作：

df =  df.set_index(['Family', 'Date','Client']).sort_index()[['Price']]
df['diff'] = np.nan
idx = pd.IndexSlice

for ix in df.index.levels[0]:
    df.loc[ idx[ix,:], 'diff'] = df.loc[idx[ix,:], 'Price' ].diff().fillna(0)

I added .fillna(0) to the diff() statement.我将.fillna(0)添加到diff()语句中。 It returns:它返回：

                     Price      diff
Family Date       Client                    
Hugo   2021-04-15 1       2.233544  0.000000
       2021-04-16 1       2.087254 -0.146291
Victor 2021-04-15 2       1.842629  0.000000
       2021-04-16 2       0.322594 -1.520035

Groupby Diff - Pandas

问题描述

1 个解决方案

解决方案1
1 已采纳 2021-04-22 09:24:37

Groupby Diff - Pandas

问题描述

1 个解决方案

解决方案1 1 已采纳 2021-04-22 09:24:37

解决方案1
1 已采纳 2021-04-22 09:24:37