简体   繁体   English

Diff() function 与 groupby 一起使用 pandas

[英]Diff() function use with groupby for pandas

I am encountering an errors each time i attempt to compute the difference in readings for a meter in my dataset.每次我尝试计算数据集中仪表读数的差异时,我都会遇到错误。 The dataset structure is this.数据集结构是这样的。

id  paymenttermid   houseid     houseid-meterid     quantity    month   year    cleaned_quantity
Datetime                                
2019-02-01  255     water   215     215M201     23.0    2   2019    23.0
2019-02-01  286     water   193     193M181     24.0    2   2019    24.0
2019-02-01  322     water   172     172M162     22.0    2   2019    22.0
2019-02-01  323     water   176     176M166     61.0    2   2019    61.0
2019-02-01  332     water   158     158M148     15.0    2   2019    15.0

I am attempting to generate a new column called consumption that computes the difference in quantities consumed for each house(identified by houseid-meterid ) after every month of the year.我正在尝试生成一个名为 consumption 的新列,该列计算一年中每个月之后每个房屋(由houseid-meterid标识)消耗数量的差异。

The code i am using to implement this is:我用来实现这个的代码是:

water_df["consumption"] = water_df.groupby(["year", "month", "houseid-meterid"])["cleaned_quantity"].diff(-1)

After executing this code, the consumption column is filled with NaN values.执行此代码后,消耗列将填充NaN值。 How can I correctly implement this logic.我怎样才能正确地实现这个逻辑。 The end result looks like this:最终结果如下所示:

id  paymenttermid   houseid     houseid-meterid     quantity    month   year    cleaned_quantity    consumption
Datetime                                    
2019-02-01  255     water   215     215M201     23.0    2   2019    23.0    NaN
2019-02-01  286     water   193     193M181     24.0    2   2019    24.0    NaN
2019-02-01  322     water   172     172M162     22.0    2   2019    22.0    NaN
2019-02-01  323     water   176     176M166     61.0    2   2019    61.0    NaN
2019-02-01  332     water   158     158M148     15.0    2   2019    15.0    NaN

Many thank in advance.非常感谢。

I have attempted to use我试图使用

water_df["consumption"] = water_df.groupby(["year", "month", "houseid-meterid"])["cleaned_quantity"].diff(-1)

and

water_df["consumption"] = water_df.groupby(["year", "month", "houseid-meterid"])["cleaned_quantity"].diff(0)

and

water_df["consumption"] = water_df.groupby(["year", "month", "houseid-meterid"])["cleaned_quantity"].diff()

all this commands result in the same behaviour as stated above.所有这些命令都会导致与上述相同的行为。

Expected output should be:预计 output 应该是:



Datetime    houseid-meterid cleaned_quantity    consumption                             
2019-02-01    215M201         23.0              20
2019-03-02    215M201         43.0              9
2019-04-01    215M201         52.0              12
2019-05-01    215M201         64.0              36
2019-06-01    215M201         100.0             20

what steps should i take?我应该采取什么步骤?

Sort values by Datetime (if needed) then group by houseid-meterid before compute the diff for cleaned_quantity values then shift row to align with the right data:Datetime排序值(如果需要)然后按houseid-meterid ,然后计算cleaned_quantity值的差异,然后移动行以与正确的数据对齐:

df['consumption'] = (df.sort_values('Datetime')
                       .groupby('houseid-meterid')['cleaned_quantity']
                       .transform(lambda x: x.diff().shift(-1)))
print(df)

# Output
    Datetime houseid-meterid  cleaned_quantity  consumption
0 2019-02-01         215M201              23.0         20.0
1 2019-03-02         215M201              43.0          9.0
2 2019-04-01         215M201              52.0         12.0
3 2019-05-01         215M201              64.0         36.0
4 2019-06-01         215M201             100.0          NaN

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM