Diff() function 与 groupby 一起使用 pandas

Question

I am encountering an errors each time i attempt to compute the difference in readings for a meter in my dataset.每次我尝试计算数据集中仪表读数的差异时，我都会遇到错误。 The dataset structure is this.数据集结构是这样的。

id  paymenttermid   houseid     houseid-meterid     quantity    month   year    cleaned_quantity
Datetime                                
2019-02-01  255     water   215     215M201     23.0    2   2019    23.0
2019-02-01  286     water   193     193M181     24.0    2   2019    24.0
2019-02-01  322     water   172     172M162     22.0    2   2019    22.0
2019-02-01  323     water   176     176M166     61.0    2   2019    61.0
2019-02-01  332     water   158     158M148     15.0    2   2019    15.0

I am attempting to generate a new column called consumption that computes the difference in quantities consumed for each house(identified by houseid-meterid ) after every month of the year.我正在尝试生成一个名为 consumption 的新列，该列计算一年中每个月之后每个房屋（由houseid-meterid标识）消耗数量的差异。

The code i am using to implement this is:我用来实现这个的代码是：

water_df["consumption"] = water_df.groupby(["year", "month", "houseid-meterid"])["cleaned_quantity"].diff(-1)

After executing this code, the consumption column is filled with NaN values.执行此代码后，消耗列将填充NaN值。 How can I correctly implement this logic.我怎样才能正确地实现这个逻辑。 The end result looks like this:最终结果如下所示：

id  paymenttermid   houseid     houseid-meterid     quantity    month   year    cleaned_quantity    consumption
Datetime                                    
2019-02-01  255     water   215     215M201     23.0    2   2019    23.0    NaN
2019-02-01  286     water   193     193M181     24.0    2   2019    24.0    NaN
2019-02-01  322     water   172     172M162     22.0    2   2019    22.0    NaN
2019-02-01  323     water   176     176M166     61.0    2   2019    61.0    NaN
2019-02-01  332     water   158     158M148     15.0    2   2019    15.0    NaN

Many thank in advance.非常感谢。

I have attempted to use我试图使用

water_df["consumption"] = water_df.groupby(["year", "month", "houseid-meterid"])["cleaned_quantity"].diff(-1)

and和

water_df["consumption"] = water_df.groupby(["year", "month", "houseid-meterid"])["cleaned_quantity"].diff(0)

and和

water_df["consumption"] = water_df.groupby(["year", "month", "houseid-meterid"])["cleaned_quantity"].diff()

all this commands result in the same behaviour as stated above.所有这些命令都会导致与上述相同的行为。

Expected output should be:预计 output 应该是：



Datetime    houseid-meterid cleaned_quantity    consumption                             
2019-02-01    215M201         23.0              20
2019-03-02    215M201         43.0              9
2019-04-01    215M201         52.0              12
2019-05-01    215M201         64.0              36
2019-06-01    215M201         100.0             20

what steps should i take?我应该采取什么步骤？

Answer 1

Sort values by Datetime (if needed) then group by houseid-meterid before compute the diff for cleaned_quantity values then shift row to align with the right data:按Datetime排序值（如果需要）然后按houseid-meterid ，然后计算cleaned_quantity值的差异，然后移动行以与正确的数据对齐：

df['consumption'] = (df.sort_values('Datetime')
                       .groupby('houseid-meterid')['cleaned_quantity']
                       .transform(lambda x: x.diff().shift(-1)))
print(df)

# Output
    Datetime houseid-meterid  cleaned_quantity  consumption
0 2019-02-01         215M201              23.0         20.0
1 2019-03-02         215M201              43.0          9.0
2 2019-04-01         215M201              52.0         12.0
3 2019-05-01         215M201              64.0         36.0
4 2019-06-01         215M201             100.0          NaN

Diff() function 与 groupby 一起使用 pandas

问题描述

1 个解决方案

解决方案1
1 已采纳 2023-01-26 14:26:49

Diff() function 与 groupby 一起使用 pandas

问题描述

1 个解决方案

解决方案1 1 已采纳 2023-01-26 14:26:49

解决方案1
1 已采纳 2023-01-26 14:26:49