如何根據數據框中日期列之間的平均差創建新列？

Question

我有一個像這樣格式化的數據框（為了便於解釋，我簡化了它）：

日期_1	日期_2	日期_3
2017-02-14	2017-02-09	2017-02-10
2018-07-16	2019-07-22	2018-07-16
2014-10-10	2017-10-10	2017-10-10

我想創建一個新列來顯示我的日期列之間的平均差異。 具體來說，我希望它計算 Date_1 & Date_2、Date_2 & Date_3 以及 Date_1 & Date_3 之間的差異。 在第 1 行中，這等於 mean(5 + 1 + 4) = 3.33。

數據框看起來像這樣：

日期_1	日期_2	日期_3	平均差
2017-02-14	2017-02-09	2017-02-10	3.33
2018-07-16	2019-07-22	2018-07-16	均值（6+6+0）= 4
2014-10-10	2017-10-10	2017-10-10	0

如果需要進一步解釋，請告訴我。

編輯：我還應該補充一點，我實際的、未簡化的 dataframe 有不止三個日期列，所以我試圖想出一個可擴展的答案。

Answer 1

有趣的問題。 由於您在每一行中都獲得了多個項目的差異，因此itertools.combinations(iterable, N)會有所幫助。 它返回iterable中項目的所有可能的N長度組合。 所以我們可以對每一行使用它，區分每個組合，絕對化它（因為有些可能因為排序而為負），然后計算平均值：

date_cols = df.filter(like='Date_').columns
df[date_cols] = df[date_cols].apply(pd.to_datetime) # Convert the columns to dates
df['Average_Difference'] = df[date_cols].apply(lambda row: np.mean([diff for diff in abs(np.diff(list(it.combinations([date.dayofyear for date in row], 2)))[:, 0])]), axis=1)

Output：

>>> df
      Date_1     Date_2     Date_3  Average_Difference
0 2017-02-14 2017-02-09 2017-02-10            3.333333
1 2018-07-16 2019-07-22 2018-07-16            4.000000
2 2014-10-10 2017-10-10 2017-10-10            0.000000

如何根據數據框中日期列之間的平均差創建新列？

問題描述

1 個解決方案

解決方案1
1 已采納

如何根據數據框中日期列之間的平均差創建新列？

問題描述

1 個解決方案

解決方案1 1 已采納

解決方案1
1 已采納