I have a pandas dataframe with something like the following:
index | order_id | cost |
---|---|---|
123a | 123 | 5 |
123b | 123 | None |
123c | 123 | 3 |
124a | 124 | None |
124b | 124 | None |
For each unique value of order_id, I'd like to drop any row that isn't the lowest cost. For any order_id that only contains nulls for the cost, any row for an order_id can be retained.
I've been struggling with this for a while now.
ol3 = ol3.loc[ol3.groupby('Order_ID').cost.idxmin()]
This code doesn't play nice with the order_id's that have only nulls. So, I tried to figure out how to drop the null's I don't want with
ol4 = ol3.loc[ol3['cost'].isna()].drop_duplicates(subset=['Order_ID', 'cost'], keep='first')
This gives me the list of null order_id's I want to retain. Not sure where to go from here. I'm pretty sure I'm looking at this the wrong way. Any help would be appreciated!
You can use transform
to get the indexes with min cost
per order_id
. We additionally need isna
check for the special order_ids
that have only NaN
s:
order_mins = df.groupby('order_id').cost.transform('min')
df[(df.cost == order_mins) | (order_mins.isna())]
cond_1 = df.cost.eq(df.cost.groupby(df.order_id).transform("min"))
cond_2 = df.cost.isna().groupby(df.order_id).transform("all")
new = df[cond_1 | cond_2]
In [246]: cond_1
Out[246]:
0 False
1 False
2 True <--- cost equals to minimum of group
3 False
4 False
Name: cost, dtype: bool
In [247]: cond_2
Out[247]:
0 False
1 False
2 False
3 True <--- the ID of these has all NaNs
4 True <--- in the cost part (id 124)
Name: cost, dtype: bool
In [248]: new
Out[248]:
index order_id cost
2 123c 123 3.0
3 124a 124 NaN
4 124b 124 NaN
i did df.cost = pd.to_numeric(df.cost, errors="coerce")
prior to above.
You can (temporarily) fill the NA/None with np.inf
before getting the idxmin:
ol3.loc[ol3['cost'].fillna(np.inf).groupby(ol3['order_id']).idxmin()]
You will have exactly one row per order_id
output:
index order_id cost
2 123c 123 3.0
3 124a 124 NaN
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.