[英]calculate difference between rows for the same product
我正在使用 Python 和 Pandas 對包含數百萬行的 dataframe 進行一些計算,如下所示:
city product month sold
Milan Spaghetti 2021-03-31 30300
Milan Spaghetti 2021-06-30 26958
Milan Spaghetti 2021-09-30 26775
Milan Spaghetti 2021-12-31 44185
Milan Spaghetti 2022-03-31 32716
Milan Spaghetti 2022-06-30 25881
Milan Maccheroni 2021-03-31 89584
Milan Maccheroni 2021-06-30 81434
Milan Maccheroni 2021-09-30 83360
Milan Maccheroni 2021-12-31 123945
Milan Maccheroni 2022-03-31 97278
Milan Maccheroni 2022-06-30 82959
Rome Spaghetti 2021-01-31 1524
Rome Spaghetti 2021-04-30 1548
Rome Spaghetti 2021-07-31 1577
Rome Spaghetti 2021-10-31 1438
Rome Spaghetti 2022-01-31 1556
Rome Spaghetti 2022-04-30 1471
Rome Spaghetti 2022-07-31 1453
Rome Maccheroni 2021-01-31 15646
Rome Maccheroni 2021-04-30 15877
Rome Maccheroni 2021-07-31 15289
Rome Maccheroni 2021-10-31 16675
Rome Maccheroni 2022-01-31 17028
Rome Maccheroni 2022-04-30 16490
Rome Maccheroni 2022-07-31 14664
我正在尋找一種方法來計算同一城市和產品的后續月份之間的變化,以便生成的 dataframe 如下所示:
city product month sold change
Milan Spaghetti 31/03/2021 30300
Milan Spaghetti 30/06/2021 26958 -3342
Milan Spaghetti 30/09/2021 26775 -183
Milan Spaghetti 31/12/2021 44185 17410
Milan Spaghetti 31/03/2022 32716 -11469
Milan Spaghetti 30/06/2022 25881 -6835
Milan Maccheroni 31/03/2021 89584
Milan Maccheroni 30/06/2021 81434 -8150
Milan Maccheroni 30/09/2021 83360 1926
Milan Maccheroni 31/12/2021 123945 40585
Milan Maccheroni 31/03/2022 97278 -26667
Milan Maccheroni 30/06/2022 82959 -14319
Rome Spaghetti 31/01/2021 1524
Rome Spaghetti 30/04/2021 1548 24
Rome Spaghetti 31/07/2021 1577 29
Rome Spaghetti 31/10/2021 1438 -139
Rome Spaghetti 31/01/2022 1556 118
Rome Spaghetti 30/04/2022 1471 -85
Rome Spaghetti 31/07/2022 1453 -18
Rome Maccheroni 31/01/2021 15646
Rome Maccheroni 30/04/2021 15877 231
Rome Maccheroni 31/07/2021 15289 -588
Rome Maccheroni 31/10/2021 16675 1386
Rome Maccheroni 31/01/2022 17028 353
Rome Maccheroni 30/04/2022 16490 -538
Rome Maccheroni 31/07/2022 14664 -1826
僅當城市和產品字段相同時,代碼才計算兩行之間的變化。 是否可以在不迭代行的情況下做到這一點?
請忽略月份格式的變化,它與解決方案無關。
這很容易通過shift
function 向量化。 你只需要確保你:
NaN
。代碼看起來像這樣:
category_cols = ["city", "product"]
df = df.sort_values(category_cols + ["month"])
# Get difference
df["change"] = df.sold - df.sold.shift(1)
# Get new categories, mark missing
new_cat_mask = (df[category_cols] != df[category_cols].shift(1)).any(axis=1)
df.loc[new_cat_mask, "change"] = np.nan
這就是你要找的。
difference_df = (df
.assign(difference=lambda x: x.groupby(['city', 'product'])['sold'].transform(lambda x: x.diff()))
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.