简体   繁体   English

两个 Pandas 数据帧:基于日期,将值添加到 dataframe

[英]Two Pandas dataframes: Based on date, add value to dataframe

My first Pandas dataframe looks (exemplarily) as follows - Date values are of datetime , Temp in F values are of float and Signal values are of string .我的第一个 Pandas dataframe 看起来(示例性地)如下 - Date值是datetimeTemp in F值是floatSignal值是string

         Date  Temp in F   Signal    
1  1990-10-01    2.23337   freeze     
2  1991-07-31    2.99860  defrost     
3  1991-10-01    3.12221   freeze     
4  1992-07-31     3.2328  defrost     
5  1992-10-01    4.21327   freeze     
6  1993-07-29    2.23222  defrost     
7  1993-10-01    1.53200   freeze     
8  1994-07-29    2.15030  defrost     
9  1994-10-03    1.22299   freeze     
..        ...        ...      ...   
57 2018-10-01    8.95045   freeze  
58 2019-07-31    9.32463  defrist  
59 2019-10-01    9.34722   freeze  
60 2020-07-31   10.53200  defrost
61 2020-10-01   10.34000   freeze   

My second Pandas dataframe looks (exemplarily) as follows - Date values are of datetime , Moisture values are of float.我的第二个 Pandas dataframe 看起来(示例性地)如下 - Date值是datetimeMoisture值是浮动的。

      Date     Moisture
1994-05-27      4.21232
1995-05-19      3.30000
1996-05-24      3.43227
1997-05-23      3.63333
1998-05-15      4.60000
1999-05-28      2.43240
2000-05-26      1.34237
2001-05-21      1.23430
2002-05-29      2.34343
2003-05-02      1.83433
2004-04-29      2.34341
2005-06-28      3.15373
2006-05-05      1.78565
2007-05-04      0.34533
2008-08-02      0.42267
2009-05-07      0.40000
2010-08-07      0.30000
2011-05-06      2.30000
2012-05-04      3.12300
2013-05-06      4.10200
2014-05-02      2.42000
2015-05-08      2.53300
2016-06-09      1.20000
2017-05-11      1.45000
2018-05-10      1.30000
2019-05-15      1.67230
2020-05-29      2.34000

Now I want to add the values of Moisture to the first dataframe based on their date value where the Signal is freeze .现在我想根据Signalfreeze的日期值将Moisture的值添加到第一个 dataframe 。 To give you examples给你举例

  • In the second dataframe: The value 4.21232 was recorded on 1994-05-27.在第二个 dataframe 中:值 4.21232 记录于 1994-05-27。 This value shall be added to row of dataframe 1 where the date is 1994-10-03 because it is after the date of moisture (=1994-05-27) and the signal is freeze.该值应添加到日期为 1994-10-03 的 dataframe 1 行,因为它在潮湿日期(=1994-05-27)之后并且信号冻结。
  • In the second dataframe: The value 1.67230 was recorded on 2019-05-15.在第二个 dataframe 中:值 1.67230 记录于 2019-05-15。 This value shall be added to row of dataframe 1 where the date is 2019-10-01 because it is after the date of moisture (=2019-05-15) and the signal is freeze.该值应添加到日期为 2019-10-01 的 dataframe 1 行,因为它在潮湿日期(=2019-05-15)之后并且信号冻结。
  • If there is no value for a date in-between found, it should remain empty.如果没有找到中间日期的值,它应该保持为空。

This procedure should be done with all values of the second dataframe's column "Moisture" so that the modified first dataframe should look as follows:此过程应使用第二个数据帧的“水分”列的所有值完成,以便修改后的第一个 dataframe 应如下所示:

         Date  Temp in F   Signal    Moisture
1  1990-10-01    2.23337   freeze     
2  1991-07-31    2.99860  defrost     
3  1991-10-01    3.12221   freeze     
4  1992-07-31     3.2328  defrost     
5  1992-10-01    4.21327   freeze     
6  1993-07-29    2.23222  defrost     
7  1993-10-01    1.53200   freeze     
8  1994-07-29    2.15030  defrost     
9  1994-10-03    1.22299   freeze     4.21232  
..        ...        ...      ...   
57 2018-10-01    8.95045   freeze     1.30000
58 2019-07-31    9.32463  defrist  
59 2019-10-01    9.34722   freeze     1.67230
60 2020-07-31   10.53200  defrost
61 2020-10-01   10.34000   freeze     2.34000

Anyone an idea how to do solve that?任何人都知道如何解决这个问题?

Use merge_asof() with subset of required data for join, then add data back in.merge_asof()与连接所需数据的子集一起使用,然后重新添加数据。

df1 = pd.read_csv(io.StringIO("""         Date  Temp_in_F   Signal    
1  1990-10-01    2.23337   freeze     
2  1991-07-31    2.99860  defrost     
3  1991-10-01    3.12221   freeze     
4  1992-07-31     3.2328  defrost     
5  1992-10-01    4.21327   freeze     
6  1993-07-29    2.23222  defrost     
7  1993-10-01    1.53200   freeze     
8  1994-07-29    2.15030  defrost     
9  1994-10-03    1.22299   freeze     
..        ...        ...      ...   
57 2018-10-01    8.95045   freeze  
58 2019-07-31    9.32463  defrist  
59 2019-10-01    9.34722   freeze  
60 2020-07-31   10.53200  defrost
61 2020-10-01   10.34000   freeze"""), sep="\s+")

df1["Date"] = pd.to_datetime(df1["Date"],errors="coerce")
df1 = df1.dropna().reset_index(drop=True)

df2 = pd.read_csv(io.StringIO("""      Date     Moisture
1994-05-27      4.21232
1995-05-19      3.30000
1996-05-24      3.43227
1997-05-23      3.63333
1998-05-15      4.60000
1999-05-28      2.43240
2000-05-26      1.34237
2001-05-21      1.23430
2002-05-29      2.34343
2003-05-02      1.83433
2004-04-29      2.34341
2005-06-28      3.15373
2006-05-05      1.78565
2007-05-04      0.34533
2008-08-02      0.42267
2009-05-07      0.40000
2010-08-07      0.30000
2011-05-06      2.30000
2012-05-04      3.12300
2013-05-06      4.10200
2014-05-02      2.42000
2015-05-08      2.53300
2016-06-09      1.20000
2017-05-11      1.45000
2018-05-10      1.30000
2019-05-15      1.67230
2020-05-29      2.34000"""), sep="\s+")
df2["Date"] = pd.to_datetime(df2["Date"])

# useful ...
mask = df1["Signal"]=="freeze"

# exclude defrost from merge_asof()
pd.concat([
    pd.merge_asof(df1[mask].sort_values("Date"), df2.sort_values("Date"), on="Date")
    , df1[~mask] # put defrost back in
]).sort_values("Date").reset_index(drop=True)


output output

      Date Temp_in_F   Signal  Moisture
1990-10-01   2.23337   freeze       NaN
1991-07-31   2.99860  defrost       NaN
1991-10-01   3.12221   freeze       NaN
1992-07-31    3.2328  defrost       NaN
1992-10-01   4.21327   freeze       NaN
1993-07-29   2.23222  defrost       NaN
1993-10-01   1.53200   freeze       NaN
1994-07-29   2.15030  defrost       NaN
1994-10-03   1.22299   freeze   4.21232
2018-10-01   8.95045   freeze   1.30000
2019-07-31   9.32463  defrist       NaN
2019-10-01   9.34722   freeze   1.67230
2020-07-31  10.53200  defrost       NaN
2020-10-01  10.34000   freeze   2.34000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM