如何在pandas groupby對象上應用函數並將結果保存回父數據幀的新列？

Question

我有一個像這樣的pandas數據框：

In [5]: import pandas as pd                                                     

In [6]: df = pd.DataFrame({'X': [0, 123, 342, 353, 467, 345, 789, 543, 3913], 
   ...:                    'Y': [0, 12, 23, 41, 23, 45, 23, 53, 23], 
   ...:                    'Group': [0, 1, 2, 0, 1, 2, 0, 1, 2]})               

In [7]: df                                                                      
Out[7]: 
      X   Y  Group
0     0   0      0
1   123  12      1
2   342  23      2
3   353  41      0
4   467  23      1
5   345  45      2
6   789  23      0
7   543  53      1
8  3913  23      2

這三組代表測量系列，我想計算每個測量系列的前一個元素的歐氏距離，並按每次測量加起來。 （第一測量距離= 0）。

我在這里閱讀了有關如何將groupby操作的結果重新分配回父數據幀的所有論壇主題。 但是在我基於組計算數據幀的每個條目（而不是聚合）的情況下，我找不到任何解決方案。

所以我想知道如何結合這些步驟：

from scipy.spatial.distance import euclidean

# 1. Group data
group = df.groupby('Group')
# 2. Calculate cumulative euclidean distance for each group
group['Distance'] = group.apply(lambda row: euclidean(row['X'], row['Y']).cumsum(), axis=1)
# 3. Assign back to original dataframe

第1步非常簡單。 對於第2步，我嘗試了很多df.groupby.apply和df.groupby.apply.transform的組合以及定義我自己的函數（不知道這是否適合單行）。 但我無法讓它按照我想要的方式行事。 我假設groupby().transform()是我想要的，但我不能讓它按行操作。

另外，為了將結果重新分配給我的原始數據幀而不是僅重新分配給groupby對象，我嘗試了df.join ， pd.merge ， pd.concat等等，但我現在處於一個我非常困惑的地方，區別在於：d。

我想要的輸出是：

Out[7]: 
      X   Y  Group  Distance  Cumulative Distance
0     0   0      0         0                    0
1   123  12      1         0                    0 
2   342  23      2         0                    0
3   353  41      0    355.37               355.37   
4   467  23      1    344.17               344.17     
5   345  45      2     22.20                22.20    
6   789  23      0    436.37               791.74     
7   543  53      1     81.71               425.88     
8  3913  23      2   3568.07              3590.44

我只需要累積距離（每組再次計算）。 但我將個別距離列為中間步驟。

Answer 1

使用groupby apply shift來獲取每一行的前一個點，然后使用bfill用自己填充第一個點。

之后，創建包含zip X和Y新列。

df.sort_values('Group', inplace=True)
df[['X_shift', 'Y_shift']] = df.groupby('Group')[['X', 'Y']].apply(lambda x: x.shift(1)).bfill()
df['point_1'] = tuple(zip(df.X, df.Y))
df['point_2'] = tuple(zip(df.X_shift, df.Y_shift))

df

      X   Y  Group  X_shift  Y_shift     point_1        point_2
0     0   0      0      0.0      0.0      (0, 0)     (0.0, 0.0)
3   353  41      0      0.0      0.0   (353, 41)     (0.0, 0.0)
6   789  23      0    353.0     41.0   (789, 23)  (353.0, 41.0)
1   123  12      1    123.0     12.0   (123, 12)  (123.0, 12.0)
4   467  23      1    123.0     12.0   (467, 23)  (123.0, 12.0)
7   543  53      1    467.0     23.0   (543, 53)  (467.0, 23.0)
2   342  23      2    342.0     23.0   (342, 23)  (342.0, 23.0)
5   345  45      2    342.0     23.0   (345, 45)  (342.0, 23.0)
8  3913  23      2    345.0     45.0  (3913, 23)  (345.0, 45.0)

並且使用apply來計算每個點上的歐氏距離然后使用groupby和cumsum來得到最終結果。

df['Distance'] = df.apply(lambda row: euclidean(row.point_1, row.point_2), axis=1)

df

      X   Y  Group  X_shift  Y_shift     point_1        point_2     Distance
0     0   0      0      0.0      0.0      (0, 0)     (0.0, 0.0)     0.000000
3   353  41      0      0.0      0.0   (353, 41)     (0.0, 0.0)   355.373043
6   789  23      0    353.0     41.0   (789, 23)  (353.0, 41.0)   436.371401
1   123  12      1    123.0     12.0   (123, 12)  (123.0, 12.0)     0.000000
4   467  23      1    123.0     12.0   (467, 23)  (123.0, 12.0)   344.175827
7   543  53      1    467.0     23.0   (543, 53)  (467.0, 23.0)    81.706793
2   342  23      2    342.0     23.0   (342, 23)  (342.0, 23.0)     0.000000
5   345  45      2    342.0     23.0   (345, 45)  (342.0, 23.0)    22.203603
8  3913  23      2    345.0     45.0  (3913, 23)  (345.0, 45.0)  3568.067824

df['Cumulative Distance'] = df.groupby('Group').Distance.cumsum()

# Drop unuse columns
df.drop(columns=['X_shift', 'Y_shift', 'point_1', 'point_2'], inplace=True)
df.sort_index(inplace=True)
df

      X   Y  Group     Distance  Cumulative Distance
0     0   0      0     0.000000             0.000000
1   123  12      1     0.000000             0.000000
2   342  23      2     0.000000             0.000000
3   353  41      0   355.373043           355.373043
4   467  23      1   344.175827           344.175827
5   345  45      2    22.203603            22.203603
6   789  23      0   436.371401           791.744445
7   543  53      1    81.706793           425.882620
8  3913  23      2  3568.067824          3590.271428

如何在pandas groupby對象上應用函數並將結果保存回父數據幀的新列？

問題描述

1 個解決方案

解決方案1
1 已采納 2019-04-16 14:05:21

如何在pandas groupby對象上應用函數並將結果保存回父數據幀的新列？

問題描述

1 個解決方案

解決方案1 1 已采納 2019-04-16 14:05:21

解決方案1
1 已采納 2019-04-16 14:05:21