pandas.core.groupby.DataFrameGroupBy.idxmin() 很慢，我怎样才能让我的代码更快？

Question

i am trying to do same action as SQL group by and take min value:我正在尝试执行与 SQL group by 相同的操作并取最小值：

select id,min(value) ,other_fields...
from table
group by ('id')

i tried:我试过：

dfg = df.groupby('id', sort=False)
idx = dfg['value'].idxmin()
df = df.loc[idx, list(df.columns.values)]

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.idxmin.html but line 2 the idxmin() is taking more than half hour on ~4M columns in df where the group by takes less than 1 second, what am i missing is it suppose to take that long? https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.core.groupby.DataFrameGroupBy.idxmin.html但是第 2 行 idxmin() 在 df 中的 ~4M 列上花费了半个多小时group by 花费不到 1 秒的地方，我想念的是它应该花那么长时间吗？ how can make this process faster?如何使这个过程更快？ will it be faster in pure SQL?在纯 SQL 中会更快吗？

Answer 1

df1 = df.sort_values(by=['value']).drop_duplicates('id', keep='first')