Pandas - 如何根据唯一列值删除行，其中另一个列值是最小值并处理空值？

Question

我有一个 pandas dataframe 具有以下内容：

指数	order_id	成本
123a	123	5
123b	123	没有任何
123c	123	3
124a	124	没有任何
124b	124	没有任何

对于 order_id 的每个唯一值，我想删除任何不是最低成本的行。 对于仅包含空值的任何 order_id 成本，可以保留 order_id 的任何行。

我已经为此苦苦挣扎了一段时间。

ol3 = ol3.loc[ol3.groupby('Order_ID').cost.idxmin()]

此代码不适用于只有空值的 order_id。 所以，我试图弄清楚如何删除我不想要的空值

ol4 = ol3.loc[ol3['cost'].isna()].drop_duplicates(subset=['Order_ID', 'cost'], keep='first')

这给了我想要保留的 null order_id 的列表。 不确定从这里到 go 的位置。 我很确定我看错了。 任何帮助，将不胜感激！

Answer 1

您可以使用transform来获取每个order_id cost最低的索引。 我们还需要isna检查只有NaN的特殊order_ids ：

order_mins = df.groupby('order_id').cost.transform('min')
df[(df.cost == order_mins) | (order_mins.isna())]

Answer 2

cond_1 = df.cost.eq(df.cost.groupby(df.order_id).transform("min")) 
cond_2 = df.cost.isna().groupby(df.order_id).transform("all")
new    = df[cond_1 | cond_2]

条件1：检查成本是否等于其组的最小值
条件2：检查一个组是否充满了缺失
如果其中任何一个为真，则保留相应的行

In [246]: cond_1
Out[246]:
0    False
1    False
2     True            <--- cost equals to minimum of group
3    False
4    False
Name: cost, dtype: bool

In [247]: cond_2
Out[247]:
0    False
1    False
2    False
3     True           <--- the ID of these has all NaNs  
4     True           <--- in the cost part (id 124)
Name: cost, dtype: bool

In [248]: new
Out[248]:
  index  order_id  cost
2  123c       123   3.0
3  124a       124   NaN
4  124b       124   NaN

我在上面之前做了df.cost = pd.to_numeric(df.cost, errors="coerce") 。

Answer 3

在获取np.inf填充 NA/None：

ol3.loc[ol3['cost'].fillna(np.inf).groupby(ol3['order_id']).idxmin()]

每个 order_id 将只有一行

output：

  index  order_id  cost
2  123c       123   3.0
3  124a       124   NaN

Pandas - 如何根据唯一列值删除行，其中另一个列值是最小值并处理空值？

问题描述

2 个解决方案

解决方案1
1 2022-09-21 15:40:45

解决方案2
0 2022-09-21 15:43:35

解决方案3
0 2022-09-21 16:03:02

Pandas - 如何根据唯一列值删除行，其中另一个列值是最小值并处理空值？

问题描述

2 个解决方案

解决方案1 1 2022-09-21 15:40:45

解决方案2 0 2022-09-21 15:43:35

解决方案3 0 2022-09-21 16:03:02

解决方案1
1 2022-09-21 15:40:45

解决方案2
0 2022-09-21 15:43:35

解决方案3
0 2022-09-21 16:03:02