简体   繁体   中英

pandas dataframe function to return rows where date is most recent and one of the column contains the input value, throwing error

I am trying to filter the rows where there is latest date for a particular input if that exists within the data frame column.

I have written following code:

def ChkFindDate(df, id):
    if df.loc[df['id'] == id & (df['Chk']==1)]:
        recent_date = df['DateCol'].max()
        df1 = df[df['DateCol'] == recent_date]
        return(df1['id', "Chk", 'DateCol', "Col2"])
    else:
        return(None)

if the column 'id' contains the input id, check for the most recent date and then return rows from 4 columns in the dataframe. otherwise return none.

when I used ChkFindDate(DataFrame, 12345) , this is throwing error:

/apps/dsa/venv/lib/python3.6/site-packages/pandas/core/generic.py in __nonzero__(self)
   1571         raise ValueError("The truth value of a {0} is ambiguous. "
   1572                          "Use a.empty, a.bool(), a.item(), a.any() or a.all()."
-> 1573                          .format(self.__class__.__name__))
   1574 
   1575     __bool__ = __nonzero__

ValueError: The truth value of a DataFrame is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Another way could be:

def ChkFindDate(df, id):

    if (df['id'].eq(id) & df['Chk'].eq(1)).any():
        recent_date = df['DateCol'].max()
        return df.loc[df['DateCol'] == recent_date, ['id', "Chk", 'DateCol', "Col2"]]
    return

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM