[英]Assign values to new column based on conditions between two pandas DataFrames
[英]How to get new column in dataframe that is based on multiple conditions between two dataframes?
我有兩個數據框,我希望在 DF1 中獲得一個列,該列將具有“當前日期”列的值加上與 DF2 中的相關狀態和技術相關的天數。 例如,在下面的“新日期”列中的第一個值是 18/03/2022 + 1095 天,因為它正在檢查技術是否 = 風和狀態 = 施工。
東風1
當前日期 | 技術 | 狀態 | 代碼要求的新日期 |
---|---|---|---|
18/03/2022 | 風 | 建造 | 16/12/2022 |
15/02/2022 | 太陽的 | 建造 | 15/11/2022 |
24/01/2022 | 電池 | 申請獲批 | 24/10/2022 |
23/09/2020 | 風 | 申請獲批 | 24/03/2023 |
18/11/2021 | 太陽的 | 已提交申請 | 18/11/2023 |
25/06/2020 | 太陽的 | 申請獲批 | 25/03/2021 |
27/02/2020 | 風 | 已提交申請 | 25/02/2025 |
10/03/2022 | 電池 | 已提交申請 | 09/03/2024 |
東風2
技術 | 已提交申請 | 申請獲批 | 建造 |
---|---|---|---|
電池 | 730 | 273.75 | 273.75 |
太陽能光伏 | 730 | 273.75 | 273.75 |
風 | 1825 | 912.5 | 1095 |
使用DataFrame.melt
並通過to_timedelta
將值轉換為時間增量(如果需要更高的准確性,請刪除.astype(int)
):
df2 = (df2.melt('Technology', var_name='Status', value_name='New Date')
.assign(**{'New Date':
lambda x: pd.to_timedelta(x['New Date'].astype(int), unit='d')}))
print (df2)
Technology Status New Date
0 Battery Application submitted 730 days
1 Solar Photovoltaics Application submitted 730 days
2 Wind Application submitted 1825 days
3 Battery Application approved 273 days
4 Solar Photovoltaics Application approved 273 days
5 Wind Application approved 912 days
6 Battery Construction 273 days
7 Solar Photovoltaics Construction 273 days
8 Wind Construction 1095 days
然后使用 left join 並添加Current Date
列:
df = df1.merge(df2, on=['Technology','Status'], how='left')
df['New Date'] += pd.to_datetime(df['Current Date'], dayfirst=True)
print (df)
Current Date Technology Status New Date
0 18/03/2022 Wind Construction 2025-03-17
1 15/02/2022 Solar Construction NaT
2 24/01/2022 Battery Application approved 2022-10-24
3 23/09/2020 Wind Application approved 2023-03-24
4 18/11/2021 Solar Application submitted NaT
5 25/06/2020 Solar Application approved NaT
6 27/02/2020 Wind Application submitted 2025-02-25
7 10/03/2022 Battery Application submitted 2024-03-09
為了匹配Solar Photovoltaics
值,可以使用拆分並選擇第一個值:
df2['Technology'] = df2['Technology'].str.split().str[0]
df2 = (df2.melt('Technology', var_name='Status', value_name='New Date')
.assign(**{'New Date':
lambda x: pd.to_timedelta(x['New Date'].astype(int), unit='d')}))
print (df2)
Technology Status New Date
0 Battery Application submitted 730 days
1 Solar Application submitted 730 days
2 Wind Application submitted 1825 days
3 Battery Application approved 273 days
4 Solar Application approved 273 days
5 Wind Application approved 912 days
6 Battery Construction 273 days
7 Solar Construction 273 days
8 Wind Construction 1095 days
df = df1.merge(df2, on=['Technology','Status'], how='left')
df['New Date'] += pd.to_datetime(df['Current Date'], dayfirst=True)
print (df)
Current Date Technology Status New Date
0 18/03/2022 Wind Construction 2025-03-17
1 15/02/2022 Solar Construction 2022-11-15
2 24/01/2022 Battery Application approved 2022-10-24
3 23/09/2020 Wind Application approved 2023-03-24
4 18/11/2021 Solar Application submitted 2023-11-18
5 25/06/2020 Solar Application approved 2021-03-25
6 27/02/2020 Wind Application submitted 2025-02-25
7 10/03/2022 Battery Application submitted 2024-03-09
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.