简体   繁体   中英

Two Pandas dataframes: Based on date, add value to dataframe

My first Pandas dataframe looks (exemplarily) as follows - Date values are of datetime , Temp in F values are of float and Signal values are of string .

         Date  Temp in F   Signal    
1  1990-10-01    2.23337   freeze     
2  1991-07-31    2.99860  defrost     
3  1991-10-01    3.12221   freeze     
4  1992-07-31     3.2328  defrost     
5  1992-10-01    4.21327   freeze     
6  1993-07-29    2.23222  defrost     
7  1993-10-01    1.53200   freeze     
8  1994-07-29    2.15030  defrost     
9  1994-10-03    1.22299   freeze     
..        ...        ...      ...   
57 2018-10-01    8.95045   freeze  
58 2019-07-31    9.32463  defrist  
59 2019-10-01    9.34722   freeze  
60 2020-07-31   10.53200  defrost
61 2020-10-01   10.34000   freeze   

My second Pandas dataframe looks (exemplarily) as follows - Date values are of datetime , Moisture values are of float.

      Date     Moisture
1994-05-27      4.21232
1995-05-19      3.30000
1996-05-24      3.43227
1997-05-23      3.63333
1998-05-15      4.60000
1999-05-28      2.43240
2000-05-26      1.34237
2001-05-21      1.23430
2002-05-29      2.34343
2003-05-02      1.83433
2004-04-29      2.34341
2005-06-28      3.15373
2006-05-05      1.78565
2007-05-04      0.34533
2008-08-02      0.42267
2009-05-07      0.40000
2010-08-07      0.30000
2011-05-06      2.30000
2012-05-04      3.12300
2013-05-06      4.10200
2014-05-02      2.42000
2015-05-08      2.53300
2016-06-09      1.20000
2017-05-11      1.45000
2018-05-10      1.30000
2019-05-15      1.67230
2020-05-29      2.34000

Now I want to add the values of Moisture to the first dataframe based on their date value where the Signal is freeze . To give you examples

  • In the second dataframe: The value 4.21232 was recorded on 1994-05-27. This value shall be added to row of dataframe 1 where the date is 1994-10-03 because it is after the date of moisture (=1994-05-27) and the signal is freeze.
  • In the second dataframe: The value 1.67230 was recorded on 2019-05-15. This value shall be added to row of dataframe 1 where the date is 2019-10-01 because it is after the date of moisture (=2019-05-15) and the signal is freeze.
  • If there is no value for a date in-between found, it should remain empty.

This procedure should be done with all values of the second dataframe's column "Moisture" so that the modified first dataframe should look as follows:

         Date  Temp in F   Signal    Moisture
1  1990-10-01    2.23337   freeze     
2  1991-07-31    2.99860  defrost     
3  1991-10-01    3.12221   freeze     
4  1992-07-31     3.2328  defrost     
5  1992-10-01    4.21327   freeze     
6  1993-07-29    2.23222  defrost     
7  1993-10-01    1.53200   freeze     
8  1994-07-29    2.15030  defrost     
9  1994-10-03    1.22299   freeze     4.21232  
..        ...        ...      ...   
57 2018-10-01    8.95045   freeze     1.30000
58 2019-07-31    9.32463  defrist  
59 2019-10-01    9.34722   freeze     1.67230
60 2020-07-31   10.53200  defrost
61 2020-10-01   10.34000   freeze     2.34000

Anyone an idea how to do solve that?

Use merge_asof() with subset of required data for join, then add data back in.

df1 = pd.read_csv(io.StringIO("""         Date  Temp_in_F   Signal    
1  1990-10-01    2.23337   freeze     
2  1991-07-31    2.99860  defrost     
3  1991-10-01    3.12221   freeze     
4  1992-07-31     3.2328  defrost     
5  1992-10-01    4.21327   freeze     
6  1993-07-29    2.23222  defrost     
7  1993-10-01    1.53200   freeze     
8  1994-07-29    2.15030  defrost     
9  1994-10-03    1.22299   freeze     
..        ...        ...      ...   
57 2018-10-01    8.95045   freeze  
58 2019-07-31    9.32463  defrist  
59 2019-10-01    9.34722   freeze  
60 2020-07-31   10.53200  defrost
61 2020-10-01   10.34000   freeze"""), sep="\s+")

df1["Date"] = pd.to_datetime(df1["Date"],errors="coerce")
df1 = df1.dropna().reset_index(drop=True)

df2 = pd.read_csv(io.StringIO("""      Date     Moisture
1994-05-27      4.21232
1995-05-19      3.30000
1996-05-24      3.43227
1997-05-23      3.63333
1998-05-15      4.60000
1999-05-28      2.43240
2000-05-26      1.34237
2001-05-21      1.23430
2002-05-29      2.34343
2003-05-02      1.83433
2004-04-29      2.34341
2005-06-28      3.15373
2006-05-05      1.78565
2007-05-04      0.34533
2008-08-02      0.42267
2009-05-07      0.40000
2010-08-07      0.30000
2011-05-06      2.30000
2012-05-04      3.12300
2013-05-06      4.10200
2014-05-02      2.42000
2015-05-08      2.53300
2016-06-09      1.20000
2017-05-11      1.45000
2018-05-10      1.30000
2019-05-15      1.67230
2020-05-29      2.34000"""), sep="\s+")
df2["Date"] = pd.to_datetime(df2["Date"])

# useful ...
mask = df1["Signal"]=="freeze"

# exclude defrost from merge_asof()
pd.concat([
    pd.merge_asof(df1[mask].sort_values("Date"), df2.sort_values("Date"), on="Date")
    , df1[~mask] # put defrost back in
]).sort_values("Date").reset_index(drop=True)


output

      Date Temp_in_F   Signal  Moisture
1990-10-01   2.23337   freeze       NaN
1991-07-31   2.99860  defrost       NaN
1991-10-01   3.12221   freeze       NaN
1992-07-31    3.2328  defrost       NaN
1992-10-01   4.21327   freeze       NaN
1993-07-29   2.23222  defrost       NaN
1993-10-01   1.53200   freeze       NaN
1994-07-29   2.15030  defrost       NaN
1994-10-03   1.22299   freeze   4.21232
2018-10-01   8.95045   freeze   1.30000
2019-07-31   9.32463  defrist       NaN
2019-10-01   9.34722   freeze   1.67230
2020-07-31  10.53200  defrost       NaN
2020-10-01  10.34000   freeze   2.34000

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM