簡體   English   中英

如何比較兩個數據框中的日期並更新列中的值

[英]How to compare dates from two dataframes and update the value in the column

我有兩個參考氣象站的數據框:

      import pandas as pd
      df_shift = pd.DataFrame({'Date': ['2010-10-05', '2010-10-20', '2011-03-15',
                              '2012-03-22', '2015-01-17', '2015-01-23',
                              '2015-01-30'], 
                    'Sensor_id': [1024, 1024, 1024, 1024, 
                                  2210, 2210, 1010]})

       df_station = pd.DataFrame({'Sensor_id': [1024, 1024, 1024, 2210, 2210],
                       'Sensor_type': ['analog', 'analog', 'analog', 'dig', 'dig'], 
                       'Date': ['2010-10-01', '2010-10-22', '2011-03-14',
                                '2015-01-13', '2015-01-22']})

我想在 df_station 中創建一個新列,這個名為“new_column”。

我希望此列填充數據框(班次和氣象站)的日期字段之間的天數差異較少。

我做了以下代碼:

       # Starting with a very large value
       df_station['new_column'] = 90000

       for i in range(0, len(df_station)):
           for j in range(0, len(df_shift)):
   
               var_Difference_Date = abs(pd.to_datetime(df_station['Date'].iloc[i], 
                                         format='%Y/%m/%d') -
                                        pd.to_datetime(df_shift['Date'].iloc[j], format='%Y/%m/%d'))
   
   
                if(df_station['Sensor_id'].iloc[i] == df_shift['Sensor_id'].iloc[j]):
       
                    if(var_Difference_Date.days < df_station['new_column'].iloc[i]):
       
                       df_station['new_column'].loc[i] = var_Difference_Date.days

顯示結果,正如預期的那樣:

             Sensor_id  Sensor_type     Date       new_column
                1024         analog    2010-10-01   4
                1024         analog    2010-10-22   2
                1024         analog    2011-03-14   1
                2210          dig      2015-01-13   4
                2210          dig      2015-01-22   1

但是,是否有更有效的方法來做到這一點而不必使用兩個 For()? 謝謝你。

我們做merge_asof ,使用byon

df_station['Date'] = pd.to_datetime(df_station['Date'])
df_shift['Date'] = pd.to_datetime(df_shift['Date'])
df_shift['DIFF'] = df_shift['Date']
df = pd.merge_asof(df_station, df_shift[['Date', 'Sensor_id', 'DIFF']],
                   on='Date',
                   by='Sensor_id',
                   direction='nearest')
df['DIFF'] = (df.Date - df.DIFF).dt.days.abs()
df
Out[377]: 
   Sensor_id Sensor_type       Date  DIFF
0       1024      analog 2010-10-01     4
1       1024      analog 2010-10-22     2
2       1024      analog 2011-03-14     1
3       2210         dig 2015-01-13     4
4       2210         dig 2015-01-22     1
# Converting both dates in pandas datetime format
df_shift['Date'] = pd.to_datetime(df_shift['Date'])
df_station['Date'] = pd.to_datetime(df_station['Date'])

# Aggregating for each Sensor_id, all the dates in a list
a = df_shift.groupby(['Sensor_id'])['Date'].apply(list).reset_index(name='dates_list')

# Merging it with the df_station
df_station = df_station.merge(a, on='Sensor_id', how='left')

# Finding LESS number of days
def get_diff(x):
    d1, l = x
    for i,d2 in enumerate(l):
        if i==0:
            diff = abs((d2-d1).days)
        else:
            t = abs((d2-d1).days)
            if t<diff:
                diff = t
    return diff

df_station['new_column'] = df_station[['Date', 'dates_list']].apply(get_diff, axis=1)

df_shift['Date_s'] = pd.to_datetime(df_shift['Date'])
df_station['Date'] = pd.to_datetime(df_station['Date'])

t = pd.merge_asof(df_station, df_shift[['Date_s','Sensor_id']], 
                  left_on='Date', 
                  right_on='Date_s', 
                  direction='nearest')

t = t[t['Sensor_id_x']==t['Sensor_id_y']]

t['new column'] = abs((t['Date_s'] - t['Date']).dt.days)

t.drop(columns=['Date_s','Sensor_id_x'], inplace=True)

t.columns = ['Sensor_type','Date','Sensor_id','new column']

輸出

    Sensor_type Date        Sensor_id   new column
0   analog      2010-10-01  1024        4
1   analog      2010-10-22  1024        2
2   analog      2011-03-14  1024        1
3   dig         2015-01-13  2210        4
4   dig         2015-01-22  2210        1

構建輸入數據幀:

import pandas as pd

df_shift = pd.DataFrame({'Date': ['2010-10-05', '2010-10-20', '2011-03-15', '2012-03-22', '2015-01-17', '2015-01-23', '2015-01-30'], 'Sensor_id': [1024, 1024, 1024, 1024, 2210, 2210, 1010]})
df_station = pd.DataFrame({'Sensor_id': [1024, 1024, 1024, 2210, 2210], 'Sensor_type': ['analog', 'analog', 'analog', 'dig', 'dig'], 'Date': ['2010-10-01', '2010-10-22', '2011-03-14', '2015-01-13', '2015-01-22']})

df_shift["Date"] = pd.to_datetime(df_shift["Date"]).dt.date
df_station["Date"] = pd.to_datetime(df_station["Date"]).dt.date

合並兩個數據框並計算絕對日期差:

df_merge = pd.merge(df_station, df_shift, how="left", on="Sensor_id", suffixes=["_station","_shift"])

df_merge['Date_abs_diff'] = (df_merge.Date_shift - df_merge.Date_station).abs()

合並后的數據框現在是:

>>> df_merge
   Date_station  Sensor_id Sensor_type  Date_shift Date_abs_diff
0    2010-10-01       1024      analog  2010-10-05        4 days
1    2010-10-01       1024      analog  2010-10-20       19 days
2    2010-10-01       1024      analog  2011-03-15      165 days
3    2010-10-01       1024      analog  2012-03-22      538 days
4    2010-10-22       1024      analog  2010-10-05       17 days
5    2010-10-22       1024      analog  2010-10-20        2 days
6    2010-10-22       1024      analog  2011-03-15      144 days
7    2010-10-22       1024      analog  2012-03-22      517 days
8    2011-03-14       1024      analog  2010-10-05      160 days
9    2011-03-14       1024      analog  2010-10-20      145 days
10   2011-03-14       1024      analog  2011-03-15        1 days
11   2011-03-14       1024      analog  2012-03-22      374 days
12   2015-01-13       2210         dig  2015-01-17        4 days
13   2015-01-13       2210         dig  2015-01-23       10 days
14   2015-01-22       2210         dig  2015-01-17        5 days
15   2015-01-22       2210         dig  2015-01-23        1 days

接下來,執行 groupby 計算,取日期差異的最小值:

df_min = df_merge.groupby(by="Date_station")["Date_abs_diff"].agg("min").reset_index()

>>> df_min
  Date_station Date_abs_diff
0   2010-10-01        4 days
1   2010-10-22        2 days
2   2011-03-14        1 days
3   2015-01-13        4 days
4   2015-01-22        1 days

最后,將其合並回 df_station 和 cleanup 以獲得最終結果:

df_output = pd.merge(df_station, df_min, how="left", left_on="Date", right_on="Date_station")
df_output.drop(columns='Date_station', inplace=True)
df_output.rename(columns={'Date_abs_diff': 'new_column'}, inplace=True)
df_output['new_column'] = df_output['new_column'].dt.days

>>> df_output
   Sensor_id Sensor_type        Date  new_column
0       1024      analog  2010-10-01           4
1       1024      analog  2010-10-22           2
2       1024      analog  2011-03-14           1
3       2210         dig  2015-01-13           4
4       2210         dig  2015-01-22           1

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM