简体   繁体   English

如何在基于两个数据框之间的多个条件的数据框中获取新列?

[英]How to get new column in dataframe that is based on multiple conditions between two dataframes?

I have two dataframes and I am looking to get a column in DF1 that will have values of the "current date" column plus number of days relating the the relevant status and technology in DF2.我有两个数据框,我希望在 DF1 中获得一个列,该列将具有“当前日期”列的值加上与 DF2 中的相关状态和技术相关的天数。 For example in the below the first value in the "new date" column is 18/03/2022 + 1095 days as it is checking to see if technology = wind and status = construction.例如,在下面的“新日期”列中的第一个值是 18/03/2022 + 1095 天,因为它正在检查技术是否 = 风和状态 = 施工。

DF 1东风1

Current Date当前日期 Technology技术 Status状态 New Date DESIRED FROM CODE代码要求的新日期
18/03/2022 18/03/2022 Wind Construction建造 16/12/2022 16/12/2022
15/02/2022 15/02/2022 Solar太阳的 Construction建造 15/11/2022 15/11/2022
24/01/2022 24/01/2022 Battery电池 Application approved申请获批 24/10/2022 24/10/2022
23/09/2020 23/09/2020 Wind Application approved申请获批 24/03/2023 24/03/2023
18/11/2021 18/11/2021 Solar太阳的 Application submitted已提交申请 18/11/2023 18/11/2023
25/06/2020 25/06/2020 Solar太阳的 Application approved申请获批 25/03/2021 25/03/2021
27/02/2020 27/02/2020 Wind Application submitted已提交申请 25/02/2025 25/02/2025
10/03/2022 10/03/2022 Battery电池 Application submitted已提交申请 09/03/2024 09/03/2024

DF 2东风2

Technology技术 Application submitted已提交申请 Application approved申请获批 Construction建造
Battery电池 730 730 273.75 273.75 273.75 273.75
Solar Photovoltaics太阳能光伏 730 730 273.75 273.75 273.75 273.75
Wind 1825 1825 912.5 912.5 1095 1095

Use DataFrame.melt with convert values to timedeltas by to_timedelta (if need better accuracy remove .astype(int) ):使用DataFrame.melt并通过to_timedelta将值转换为时间增量(如果需要更高的准确性,请删除.astype(int) ):

df2 = (df2.melt('Technology', var_name='Status', value_name='New Date')
          .assign(**{'New Date': 
                    lambda x: pd.to_timedelta(x['New Date'].astype(int), unit='d')}))
print (df2)
            Technology                 Status  New Date
0              Battery  Application submitted  730 days
1  Solar Photovoltaics  Application submitted  730 days
2                 Wind  Application submitted 1825 days
3              Battery   Application approved  273 days
4  Solar Photovoltaics   Application approved  273 days
5                 Wind   Application approved  912 days
6              Battery           Construction  273 days
7  Solar Photovoltaics           Construction  273 days
8                 Wind           Construction 1095 days

And then use left join and add column Current Date :然后使用 left join 并添加Current Date列:

df = df1.merge(df2, on=['Technology','Status'], how='left')
df['New Date'] += pd.to_datetime(df['Current Date'], dayfirst=True)
print (df)
  Current Date Technology                 Status   New Date
0   18/03/2022       Wind           Construction 2025-03-17
1   15/02/2022      Solar           Construction        NaT
2   24/01/2022    Battery   Application approved 2022-10-24
3   23/09/2020       Wind   Application approved 2023-03-24
4   18/11/2021      Solar  Application submitted        NaT
5   25/06/2020      Solar   Application approved        NaT
6   27/02/2020       Wind  Application submitted 2025-02-25
7   10/03/2022    Battery  Application submitted 2024-03-09

For match Solar Photovoltaics values is possible use split and select first values:为了匹配Solar Photovoltaics值,可以使用拆分并选择第一个值:

df2['Technology'] = df2['Technology'].str.split().str[0]

df2 = (df2.melt('Technology', var_name='Status', value_name='New Date')
          .assign(**{'New Date':  
                  lambda x: pd.to_timedelta(x['New Date'].astype(int), unit='d')}))
print (df2)
  Technology                 Status  New Date
0    Battery  Application submitted  730 days
1      Solar  Application submitted  730 days
2       Wind  Application submitted 1825 days
3    Battery   Application approved  273 days
4      Solar   Application approved  273 days
5       Wind   Application approved  912 days
6    Battery           Construction  273 days
7      Solar           Construction  273 days
8       Wind           Construction 1095 days


df = df1.merge(df2, on=['Technology','Status'], how='left')
df['New Date'] += pd.to_datetime(df['Current Date'], dayfirst=True)
print (df)
  Current Date Technology                 Status   New Date
0   18/03/2022       Wind           Construction 2025-03-17
1   15/02/2022      Solar           Construction 2022-11-15
2   24/01/2022    Battery   Application approved 2022-10-24
3   23/09/2020       Wind   Application approved 2023-03-24
4   18/11/2021      Solar  Application submitted 2023-11-18
5   25/06/2020      Solar   Application approved 2021-03-25
6   27/02/2020       Wind  Application submitted 2025-02-25
7   10/03/2022    Battery  Application submitted 2024-03-09

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 根据两个 Pandas DataFrames 之间的条件为新列赋值 - Assign values to new column based on conditions between two pandas DataFrames 根据在两个熊猫数据框之间的多种条件选择来创建新列 - Creating a new column based on selecting by multiple conditions between two pandas dataframes 如何根据两个数据框中两列或三列之间的条件创建新的 boolean 列? - How to create a new boolean column based on conditions between two or three columns from two dataframes? 根据多个DataFrame的条件新建DataFrame - Create A New DataFrame Based on Conditions of Multiple DataFrames 获取两个数据帧之间最近的元素 - 多个条件 - Get closest element between two dataframes - multiple conditions 根据两个数据框中多列之间的匹配值定义新列 - Define new column based on matching values between multiple columns in two dataframes 将基于多个条件的列的值填充到 dataframe 的新列 - Populating the values of a column based on multiple conditions to a new column of a dataframe DataFrame中基于条件的新列 - New column in DataFrame based on conditions 如何基于来自熊猫中其他数据框的多个条件在数据框中创建新的布尔列 - How to create a new boolean column in a dataframe based on multiple conditions from other dataframe in pandas 如何创建一个新的数据框,其中包含两个现有数据框之间多列的值更改 - How to create a new dataframe that contains the value changes from multiple columns between two exisitng dataframes
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM