如何從 Python Pandas 列表中的日期計算距最近日期和自日期以來的天數？

Question

我在 Python 中有 Pandas 數據框，如下所示（“col1”是 datetime64 數據格式）：

col1
--------
23-11-2020
25-05-2021
...

此外，我有如下特殊日期列表（值在列表中為“對象”數據類型）：

special_dates = ["25.11.2020", "23.11.2020", "01.06.2021", "20.05.2021", ...]

我需要在我的 DataFrame 中再創建 2 列：

col2 - 距離 special_dates 列表中最近日期的天數
col3 - 自 special_dates 列表中的最新日期以來的天數

請注意，有些月份有 31 或 30 天，而在拖欠年份，2 月有不同的（28 或 29）天數

因此，我需要以下內容：

col1       | col2 | col3
-----------|------|......
23-11-2020 | 2    | 0
25-05-2021 | 7    | 5
...        | ...  | ...

我怎樣才能在 Python Pandas 中做到這一點？

Answer 1

矢量合並：

df = pd.DataFrame({'col1':["23.11.2020", "25.05.2021", "26.05.2021", "26.05.2022", "26.05.2018"]})
s = pd.Series(pd.to_datetime(special_dates, dayfirst=True)).sort_values()
df['col1'] = pd.to_datetime(df['col1'], dayfirst=True)

df = df.sort_values(by='col1').reset_index()

df['col2'] = (pd.merge_asof(df, s.rename('other'), 
                            left_on='col1', right_on='other',
                            direction='forward', allow_exact_matches=True)['other']
                .sub(df['col1']).dt.days
             )

df['col3'] = (pd.merge_asof(df, s.rename('other'),
                            left_on='col1', right_on='other',
                            direction='backward', allow_exact_matches=True)['other']
                .rsub(df['col1']).dt.days
             )

df = df.set_index('index').sort_index()

輸出：

            col1   col2   col3
index                         
0     2020-11-23    0.0    0.0
1     2021-05-25    7.0    5.0
2     2021-05-26    6.0    6.0
3     2022-05-26    NaN  359.0
4     2018-05-26  912.0    NaN

較舊的答案（對問題的誤解）

您可以使用numpy廣播：

special_dates = ["25.11.2020", "23.11.2020", "01.06.2021", "20.05.2021"] 

df['col1'] = pd.to_datetime(df['col1'], dayfirst=True)

a = pd.to_datetime(special_dates, dayfirst=True).to_numpy()
out = (df
       .join(pd.DataFrame((a-df['col1'].to_numpy()[:,None]),
                          index=df.index,
                          columns=range(1, len(special_dates)+1))
               .add_prefix('date_')
               .clip('0')
               #.apply(lambda c: c.dt.days) # uncomment for days as int
             )
      )

輸出：

        col1 date_1 date_2   date_3   date_4
0 2020-11-23 2 days 0 days 190 days 178 days
1 2021-05-25 0 days 0 days   7 days   0 days

輸出為整數（最后一行未注釋）：

        col1  date_1  date_2  date_3  date_4
0 2020-11-23       2       0     190     178
1 2021-05-25       0       0       7       0

以日期為標題的變體：

out = (df
       .join(pd.DataFrame((a-df['col1'].to_numpy()[:,None]),
                          index=df.index,
                          columns=special_dates)
               .clip('0')
               .apply(lambda c: c.dt.days)
             )
      )

輸出：

        col1  25.11.2020  23.11.2020  01.06.2021  20.05.2021
0 2020-11-23           2           0         190         178
1 2021-05-25           0           0           7           0

Answer 2

可能不是最好/最有效的方法，但您可以使用這篇文章中的days_between函數，然后計算天數之間的差異。 這會給你：

import pandas as pd
import numpy as np
from datetime import datetime

def days_between(d1, d2):
    d1 = datetime.strptime(d1, "%d-%m-%Y")
    d2 = datetime.strptime(d2, "%d.%m.%Y")
    return (d2 - d1).days

df = pd.DataFrame({'col1':["23-11-2020", "25-05-2021"]})
special_dates = ["25.11.2020", "23.11.2020", "01.06.2021", "20.05.2021"] 

for idx, date in enumerate(df['col1']):
    col2=np.inf
    col3=np.inf
    for special_date in special_dates:
        delta = days_between(date, special_date)
        if delta >= 0 and delta < col2:
            col2 = delta
        if delta <= 0 and delta > -col3:
            col3 = -delta

    df.loc[df.index[idx], 'col2'] = col2
    df.loc[df.index[idx], 'col3'] = col3

df.replace(np.inf, np.nan, inplace=True)
df[['col2','col3']].round(0)

如何從 Python Pandas 列表中的日期計算距最近日期和自日期以來的天數？

問題描述

2 個解決方案

解決方案1
0 已采納 2022-07-18 13:26:45

矢量合並：

較舊的答案（對問題的誤解）

解決方案2
0 2022-07-18 13:36:07

如何從 Python Pandas 列表中的日期計算距最近日期和自日期以來的天數？

問題描述

2 個解決方案

解決方案1 0 已采納 2022-07-18 13:26:45

矢量合並：

較舊的答案（對問題的誤解）

解決方案2 0 2022-07-18 13:36:07

解決方案1
0 已采納 2022-07-18 13:26:45

解決方案2
0 2022-07-18 13:36:07