Pandas - How to drop rows based on a unique column value where another column value is a minimum and handling nulls?

Question

I have a pandas dataframe with something like the following:

index	order_id	cost
123a	123	5
123b	123	None
123c	123	3
124a	124	None
124b	124	None

For each unique value of order_id, I'd like to drop any row that isn't the lowest cost. For any order_id that only contains nulls for the cost, any row for an order_id can be retained.

I've been struggling with this for a while now.

ol3 = ol3.loc[ol3.groupby('Order_ID').cost.idxmin()]

This code doesn't play nice with the order_id's that have only nulls. So, I tried to figure out how to drop the null's I don't want with

ol4 = ol3.loc[ol3['cost'].isna()].drop_duplicates(subset=['Order_ID', 'cost'], keep='first')

This gives me the list of null order_id's I want to retain. Not sure where to go from here. I'm pretty sure I'm looking at this the wrong way. Any help would be appreciated!

Answer 1

You can use transform to get the indexes with min cost per order_id . We additionally need isna check for the special order_ids that have only NaN s:

order_mins = df.groupby('order_id').cost.transform('min')
df[(df.cost == order_mins) | (order_mins.isna())]

Answer 2

cond_1 = df.cost.eq(df.cost.groupby(df.order_id).transform("min")) 
cond_2 = df.cost.isna().groupby(df.order_id).transform("all")
new    = df[cond_1 | cond_2]

condition 1: check if a cost is equal to its group's minimum
condition 2: check if a group is full of missings
if either of these is true, then keep corresponding rows

In [246]: cond_1
Out[246]:
0    False
1    False
2     True            <--- cost equals to minimum of group
3    False
4    False
Name: cost, dtype: bool

In [247]: cond_2
Out[247]:
0    False
1    False
2    False
3     True           <--- the ID of these has all NaNs  
4     True           <--- in the cost part (id 124)
Name: cost, dtype: bool

In [248]: new
Out[248]:
  index  order_id  cost
2  123c       123   3.0
3  124a       124   NaN
4  124b       124   NaN

i did df.cost = pd.to_numeric(df.cost, errors="coerce") prior to above.

Answer 3

You can (temporarily) fill the NA/None with np.inf before getting the idxmin:

ol3.loc[ol3['cost'].fillna(np.inf).groupby(ol3['order_id']).idxmin()]

You will have exactly one row per order_id

output:

  index  order_id  cost
2  123c       123   3.0
3  124a       124   NaN

Pandas - How to drop rows based on a unique column value where another column value is a minimum and handling nulls?

Question

2 answers

solution1
1 2022-09-21 15:40:45

solution2
0 2022-09-21 15:43:35

solution3
0 2022-09-21 16:03:02

Pandas - How to drop rows based on a unique column value where another column value is a minimum and handling nulls?

Question

2 answers

solution1 1 2022-09-21 15:40:45

solution2 0 2022-09-21 15:43:35

solution3 0 2022-09-21 16:03:02

solution1
1 2022-09-21 15:40:45

solution2
0 2022-09-21 15:43:35

solution3
0 2022-09-21 16:03:02