Collapse dataframe based on minimum integer value, but also filling null string values of other columns

Question

ID	Check 1	Check 2	Value
1		Y	1
1	Y	Y	2
2		Y	5
2			8

I have a dataframe similar to the above table. I need to find a way to collapse the table so that I have only one row for each ID, and the Value column contains the minimum value for that ID.

I'm able to do this using:

df = df.fillna('')
df.groupby(['ID'],as_index=False).min()

However I also need to preserve any of the 'Y' values in my two Check columns, so that a Y will always take priority and fill in any null values on a row with the same ID. So based on the table above, the result I'm looking for would be like this:

ID	Check 1	Check 2	Value
1	Y	Y	1
2		Y	5

What I'm seeing happen so far is that I'll be left with blanks in my Check 1 column, instead of a Y value. Any ideas on how to do this?

Answer 1

Group the dataframe by ID , then pass min as aggregate for Value column, and first as aggregate for rest of the columns.

>>> df.groupby('ID').agg({'Check 1': 'first', 'Check 2':'first', 'Value': 'min'})
 
   Check 1 Check 2  Value
ID                       
1        Y       Y      1
2     None       Y      5

Answer 2

If you sort_values on the "Value" column before groupby , you can simply keep the first valid row for all columns:

>>> df.sort_values("Value").groupby("ID").first().fillna('')
   Check 1 Check 2  Value
ID                       
1        Y       Y      1
2                Y      5

Collapse dataframe based on minimum integer value, but also filling null string values of other columns

Question

2 answers

solution1
0 2021-11-29 18:47:15

solution2
0 2021-11-29 19:01:37

Collapse dataframe based on minimum integer value, but also filling null string values of other columns

Question

2 answers

solution1 0 2021-11-29 18:47:15

solution2 0 2021-11-29 19:01:37

solution1
0 2021-11-29 18:47:15

solution2
0 2021-11-29 19:01:37