简体   繁体   中英

Collapse dataframe based on minimum integer value, but also filling null string values of other columns

ID Check 1 Check 2 Value
1 Y 1
1 Y Y 2
2 Y 5
2 8

I have a dataframe similar to the above table. I need to find a way to collapse the table so that I have only one row for each ID, and the Value column contains the minimum value for that ID.

I'm able to do this using:

df = df.fillna('')
df.groupby(['ID'],as_index=False).min()

However I also need to preserve any of the 'Y' values in my two Check columns, so that a Y will always take priority and fill in any null values on a row with the same ID. So based on the table above, the result I'm looking for would be like this:

ID Check 1 Check 2 Value
1 Y Y 1
2 Y 5

What I'm seeing happen so far is that I'll be left with blanks in my Check 1 column, instead of a Y value. Any ideas on how to do this?

Group the dataframe by ID , then pass min as aggregate for Value column, and first as aggregate for rest of the columns.

>>> df.groupby('ID').agg({'Check 1': 'first', 'Check 2':'first', 'Value': 'min'})
 
   Check 1 Check 2  Value
ID                       
1        Y       Y      1
2     None       Y      5

If you sort_values on the "Value" column before groupby , you can simply keep the first valid row for all columns:

>>> df.sort_values("Value").groupby("ID").first().fillna('')
   Check 1 Check 2  Value
ID                       
1        Y       Y      1
2                Y      5

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM