I have a datafrme
In = pd.DataFrame([["W",13,23,45,65], ["X",23,45,12,78], ["Y",12,34,56,89]],columns=["A","B","C","D","E"])
W row has min value 13, X row has min value 12, and Y row has min value 12. Keep only those columns that have min value of all the rows.
Expected output:
Out = pd.DataFrame([["W",13,45], ["X",23,12], ["Y",12,56]],columns=["A","B","D"])
How to do it?
Very straightforward: declare 'A' the row index; find the column of the smallest element in each row; eliminate the duplicate columns; select the surviving columns and 'A' from the original dataframe.
columns_to_keep = In.set_index('A').idxmin(axis=1).unique().tolist()
Out = In[['A'] + columns_to_keep]
Find min value amongst min values for each column. Equate outcome to df. Filter any columns with min value. Set A as index before filter happens
In.set_index('A', inplace=True)
In.loc[:,(In==In.min().min()).any()].reset_index()
Or the following if you do not want multiple lines of code
In.set_index('A').loc[:,(In==(In.select_dtypes(exclude='object').min().min())).any()].reset_index()
You could check if a column is either non-numeric or contains the min.
This approach is efficient as it first computes the min
per column, then compares each min to the global min.
from pandas.api.types import is_numeric_dtype
# non-numeric
mask = In.apply(is_numeric_dtype)
# contains min
m = (m:=In.min()).eq(m[mask].min()) | ~mask
Out = In.loc[:,m]
Output:
A B D
0 W 13 45
1 X 23 12
2 Y 12 56
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.