Pandas - 跨多個列的組的滾動平均值；大 dataframe

Question

我有以下 dataframe：

-----+-----+-------------+-------------+-------------------------+
| ID1 | ID2 | Box1_weight | Box2_weight | Average Prev Weight ID1 |
+-----+-----+-------------+-------------+-------------------------+
|  19 | 677 |      3      |      2      |            -            |
+-----+-----+-------------+-------------+-------------------------+
| 677 |  19 |      1      |      0      |            2            |
+-----+-----+-------------+-------------+-------------------------+
|  19 | 677 |      3      |      1      |      (0 + 3 )/2=1.5     |
+-----+-----+-------------+-------------+-------------------------+
|  19 | 677 |      7      |      0      |       (3+0+3)/3=2       |
+-----+-----+-------------+-------------+-------------------------+
| 677 |  19 |      1      |      3      |      (0+1+1)/3=0.6      |

我想根據 ID 計算過去 3 個框的重量的移動平均值。 我想對 ID1 中的所有 ID 執行此操作。

我已將要計算的列以及計算結果放在上表中，標記為“Average Prev Weight ID1”

我可以使用以下方法獲得每個列的滾動平均值：

df_copy.groupby('ID1')['Box1_weight'].apply(lambda x: x.shift().rolling(period_length, min_periods=1).mean())

但是，這並沒有考慮到該項目也可能已包裝在標有“Box2_weight”的列中

如何在兩列中獲得每個 ID 的滾動平均值？

任何指導表示贊賞。

Answer 1

不確定這是否是您想要的。 我無法理解您的要求。 但這里有一個 go：

ids = ['ID1', 'ID2']
ind = np.argsort(df[ids].to_numpy(), 1)

make_sort = lambda s, ind: np.take_along_axis(s, ind, axis=1)

f = make_sort(df[ids].to_numpy(), ind)
s = make_sort(df[['Box1_weight', 'Box2_weight']].to_numpy(), ind)

df2 = pd.DataFrame(np.concatenate([f,s], 1), columns=df.columns)

res1 = df2.groupby('ID1').Box1_weight.rolling(3, min_periods=1).mean().shift()
res2 = df2.groupby('ID2').Box2_weight.rolling(3, min_periods=1).mean().shift()

means = pd.concat([res1,res2], 1).rename(columns={'Box1_weight': 'w1', 'Box2_weight': 'w2'})
x = df.set_index([df.ID1.values, df.index])

final = x[ids].merge(means, left_index=True, right_index=True)[['w1','w2']].sum(1).sort_index(level=1)

df['final_weight'] = final.tolist()

   ID1  ID2  Box1_weight  Box2_weight  final_weight
0   19  677            3            2      0.000000
1  677   19            1            0      2.000000
2   19  677            3            1      1.500000
3   19  677            7            0      2.000000
4  677   19            1            3      0.666667

Answer 2

這是我的嘗試

堆疊 2 個 ID 和 2 個權重列以創建具有 1 個 ID 和 1 個權重列的 dataframe。 計算運行平均值並將 ID1 的運行平均值分配回 dataframe

我已經使用了你的計算滾動平均值的代碼，但我在做 ti 之前將數據安排到 df2


import pandas as pd

d = {
    "ID1": [19,677,19,19,677],
    "ID2": [677, 19, 677,677, 19],
    "Box1_weight": [3,1,3,7,1],
    "Box2_weight": [2,0,1,0,3]
}

df = pd.DataFrame(d)
display(df)

period_length=3
ids = df[["ID1", "ID2"]].stack().values
weights = df[["Box1_weight", "Box2_weight"]].stack().values

df2=pd.DataFrame(dict(ids=ids, weights=weights))

rolling_avg = df2.groupby("ids")["weights"] \
    .apply(lambda x: x.shift().rolling(period_length, min_periods=1)
    .mean()).values.reshape(-1,2)

df["rolling_avg"] = rolling_avg[:,0]


display(df)

結果


ID1 ID2 Box1_weight Box2_weight
0   19  677 3   2
1   677 19  1   0
2   19  677 3   1
3   19  677 7   0
4   677 19  1   3


ID1 ID2 Box1_weight Box2_weight rolling_avg
0   19  677 3   2   NaN
1   677 19  1   0   2.000000
2   19  677 3   1   1.500000
3   19  677 7   0   2.000000
4   677 19  1   3   0.666667

Pandas - 跨多個列的組的滾動平均值；大 dataframe

問題描述

2 個解決方案

解決方案1
1 2019-09-22 13:58:15

解決方案2
1 已采納 2019-09-22 14:14:38

Pandas - 跨多個列的組的滾動平均值； 大 dataframe

問題描述

2 個解決方案

解決方案1 1 2019-09-22 13:58:15

解決方案2 1 已采納 2019-09-22 14:14:38

Pandas - 跨多個列的組的滾動平均值；大 dataframe

解決方案1
1 2019-09-22 13:58:15

解決方案2
1 已采納 2019-09-22 14:14:38