簡體   English   中英

如何在基於兩個數據框之間的多個條件的數據框中獲取新列?

[英]How to get new column in dataframe that is based on multiple conditions between two dataframes?

我有兩個數據框,我希望在 DF1 中獲得一個列,該列將具有“當前日期”列的值加上與 DF2 中的相關狀態和技術相關的天數。 例如,在下面的“新日期”列中的第一個值是 18/03/2022 + 1095 天,因為它正在檢查技術是否 = 風和狀態 = 施工。

東風1

當前日期 技術 狀態 代碼要求的新日期
18/03/2022 建造 16/12/2022
15/02/2022 太陽的 建造 15/11/2022
24/01/2022 電池 申請獲批 24/10/2022
23/09/2020 申請獲批 24/03/2023
18/11/2021 太陽的 已提交申請 18/11/2023
25/06/2020 太陽的 申請獲批 25/03/2021
27/02/2020 已提交申請 25/02/2025
10/03/2022 電池 已提交申請 09/03/2024

東風2

技術 已提交申請 申請獲批 建造
電池 730 273.75 273.75
太陽能光伏 730 273.75 273.75
1825 912.5 1095

使用DataFrame.melt並通過to_timedelta將值轉換為時間增量(如果需要更高的准確性,請刪除.astype(int) ):

df2 = (df2.melt('Technology', var_name='Status', value_name='New Date')
          .assign(**{'New Date': 
                    lambda x: pd.to_timedelta(x['New Date'].astype(int), unit='d')}))
print (df2)
            Technology                 Status  New Date
0              Battery  Application submitted  730 days
1  Solar Photovoltaics  Application submitted  730 days
2                 Wind  Application submitted 1825 days
3              Battery   Application approved  273 days
4  Solar Photovoltaics   Application approved  273 days
5                 Wind   Application approved  912 days
6              Battery           Construction  273 days
7  Solar Photovoltaics           Construction  273 days
8                 Wind           Construction 1095 days

然后使用 left join 並添加Current Date列:

df = df1.merge(df2, on=['Technology','Status'], how='left')
df['New Date'] += pd.to_datetime(df['Current Date'], dayfirst=True)
print (df)
  Current Date Technology                 Status   New Date
0   18/03/2022       Wind           Construction 2025-03-17
1   15/02/2022      Solar           Construction        NaT
2   24/01/2022    Battery   Application approved 2022-10-24
3   23/09/2020       Wind   Application approved 2023-03-24
4   18/11/2021      Solar  Application submitted        NaT
5   25/06/2020      Solar   Application approved        NaT
6   27/02/2020       Wind  Application submitted 2025-02-25
7   10/03/2022    Battery  Application submitted 2024-03-09

為了匹配Solar Photovoltaics值,可以使用拆分並選擇第一個值:

df2['Technology'] = df2['Technology'].str.split().str[0]

df2 = (df2.melt('Technology', var_name='Status', value_name='New Date')
          .assign(**{'New Date':  
                  lambda x: pd.to_timedelta(x['New Date'].astype(int), unit='d')}))
print (df2)
  Technology                 Status  New Date
0    Battery  Application submitted  730 days
1      Solar  Application submitted  730 days
2       Wind  Application submitted 1825 days
3    Battery   Application approved  273 days
4      Solar   Application approved  273 days
5       Wind   Application approved  912 days
6    Battery           Construction  273 days
7      Solar           Construction  273 days
8       Wind           Construction 1095 days


df = df1.merge(df2, on=['Technology','Status'], how='left')
df['New Date'] += pd.to_datetime(df['Current Date'], dayfirst=True)
print (df)
  Current Date Technology                 Status   New Date
0   18/03/2022       Wind           Construction 2025-03-17
1   15/02/2022      Solar           Construction 2022-11-15
2   24/01/2022    Battery   Application approved 2022-10-24
3   23/09/2020       Wind   Application approved 2023-03-24
4   18/11/2021      Solar  Application submitted 2023-11-18
5   25/06/2020      Solar   Application approved 2021-03-25
6   27/02/2020       Wind  Application submitted 2025-02-25
7   10/03/2022    Battery  Application submitted 2024-03-09

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM