[英]calculate difference between rows for the same product
我正在使用 Python 和 Pandas 对包含数百万行的 dataframe 进行一些计算,如下所示:
city product month sold
Milan Spaghetti 2021-03-31 30300
Milan Spaghetti 2021-06-30 26958
Milan Spaghetti 2021-09-30 26775
Milan Spaghetti 2021-12-31 44185
Milan Spaghetti 2022-03-31 32716
Milan Spaghetti 2022-06-30 25881
Milan Maccheroni 2021-03-31 89584
Milan Maccheroni 2021-06-30 81434
Milan Maccheroni 2021-09-30 83360
Milan Maccheroni 2021-12-31 123945
Milan Maccheroni 2022-03-31 97278
Milan Maccheroni 2022-06-30 82959
Rome Spaghetti 2021-01-31 1524
Rome Spaghetti 2021-04-30 1548
Rome Spaghetti 2021-07-31 1577
Rome Spaghetti 2021-10-31 1438
Rome Spaghetti 2022-01-31 1556
Rome Spaghetti 2022-04-30 1471
Rome Spaghetti 2022-07-31 1453
Rome Maccheroni 2021-01-31 15646
Rome Maccheroni 2021-04-30 15877
Rome Maccheroni 2021-07-31 15289
Rome Maccheroni 2021-10-31 16675
Rome Maccheroni 2022-01-31 17028
Rome Maccheroni 2022-04-30 16490
Rome Maccheroni 2022-07-31 14664
我正在寻找一种方法来计算同一城市和产品的后续月份之间的变化,以便生成的 dataframe 如下所示:
city product month sold change
Milan Spaghetti 31/03/2021 30300
Milan Spaghetti 30/06/2021 26958 -3342
Milan Spaghetti 30/09/2021 26775 -183
Milan Spaghetti 31/12/2021 44185 17410
Milan Spaghetti 31/03/2022 32716 -11469
Milan Spaghetti 30/06/2022 25881 -6835
Milan Maccheroni 31/03/2021 89584
Milan Maccheroni 30/06/2021 81434 -8150
Milan Maccheroni 30/09/2021 83360 1926
Milan Maccheroni 31/12/2021 123945 40585
Milan Maccheroni 31/03/2022 97278 -26667
Milan Maccheroni 30/06/2022 82959 -14319
Rome Spaghetti 31/01/2021 1524
Rome Spaghetti 30/04/2021 1548 24
Rome Spaghetti 31/07/2021 1577 29
Rome Spaghetti 31/10/2021 1438 -139
Rome Spaghetti 31/01/2022 1556 118
Rome Spaghetti 30/04/2022 1471 -85
Rome Spaghetti 31/07/2022 1453 -18
Rome Maccheroni 31/01/2021 15646
Rome Maccheroni 30/04/2021 15877 231
Rome Maccheroni 31/07/2021 15289 -588
Rome Maccheroni 31/10/2021 16675 1386
Rome Maccheroni 31/01/2022 17028 353
Rome Maccheroni 30/04/2022 16490 -538
Rome Maccheroni 31/07/2022 14664 -1826
仅当城市和产品字段相同时,代码才计算两行之间的变化。 是否可以在不迭代行的情况下做到这一点?
请忽略月份格式的变化,它与解决方案无关。
这很容易通过shift
function 向量化。 你只需要确保你:
NaN
。代码看起来像这样:
category_cols = ["city", "product"]
df = df.sort_values(category_cols + ["month"])
# Get difference
df["change"] = df.sold - df.sold.shift(1)
# Get new categories, mark missing
new_cat_mask = (df[category_cols] != df[category_cols].shift(1)).any(axis=1)
df.loc[new_cat_mask, "change"] = np.nan
这就是你要找的。
difference_df = (df
.assign(difference=lambda x: x.groupby(['city', 'product'])['sold'].transform(lambda x: x.diff()))
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.