简体   繁体   中英

How can I create a column in a dataframe using conditional logic on multiple columns in another dataframe python pandas?

I am trying to take the result value associated with the latest date in my dataframe, and create a new dataframe containing 'location', 'latest_date', and 'latest_result'. I have tried the following code:

 import pandas as pd import numpy as np df = pd.read_excel('SL_report_table.xlsx') df = df.dropna(subset=['RESULT']) df.head() LOCATION TYPE DATE EVENT RESULT D_RESULT FLAG UNITS 20 AS-01 NaN 2020-11-07 13:35:00 44142.565972 100.0 1.0 NaN ug/L 21 AS-01 NaN 2020-06-16 00:00:00 43998.000000 250.0 1.0 NaN ug/L 22 AS-01 NaN 2019-10-08 13:30:00 43746.562500 260.0 1.0 NaN ug/L 23 AS-01 NaN 2019-05-14 21:40:00 43599.902778 230.0 1.0 NaN ug/L 24 AS-01 NaN 2018-10-03 15:00:00 43376.625000 100.0 0.0 NaN ug/L grouped_maxdate = df.groupby('LOCATION').DATE.max() grouped_maxdate = grouped_maxdate.to_frame() for row in df: if row['LOCATION'] == grouped_maxdate['LOCATION'] and row['DATE'] == grouped_maxdate['LOCTION']: grouped_maxdate['LAST_RESULT'] = df['RESULT']

Any thoughts?

Sort values by DATE and keep the last row for each LOCATION group:

>>> df.sort_values('DATE').groupby('LOCATION').last()

          TYPE                 DATE         EVENT  RESULT  D_RESULT  FLAG UNITS
LOCATION
AS-01      NaN  2020-11-07 13:35:00  44142.565972   100.0       1.0   NaN  ug/L

Full code:

out = df[['LOCATION', 'DATE', 'RESULT']].sort_values('DATE').groupby('LOCATION', as_index=False).last()

out.columns = ['location', 'latest_date', 'latest_result']
>>> out
  location          latest_date  latest_result
0    AS-01  2020-11-07 13:35:00          100.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM