简体   繁体   中英

Keep only those columns in dataframe based on min value of each row

I have a datafrme

In = pd.DataFrame([["W",13,23,45,65], ["X",23,45,12,78], ["Y",12,34,56,89]],columns=["A","B","C","D","E"])

W row has min value 13, X row has min value 12, and Y row has min value 12. Keep only those columns that have min value of all the rows.

Expected output:

Out = pd.DataFrame([["W",13,45], ["X",23,12], ["Y",12,56]],columns=["A","B","D"])

How to do it?

Very straightforward: declare 'A' the row index; find the column of the smallest element in each row; eliminate the duplicate columns; select the surviving columns and 'A' from the original dataframe.

columns_to_keep = In.set_index('A').idxmin(axis=1).unique().tolist()
Out = In[['A'] + columns_to_keep]

Find min value amongst min values for each column. Equate outcome to df. Filter any columns with min value. Set A as index before filter happens

In.set_index('A', inplace=True)
In.loc[:,(In==In.min().min()).any()].reset_index()

Or the following if you do not want multiple lines of code

In.set_index('A').loc[:,(In==(In.select_dtypes(exclude='object').min().min())).any()].reset_index()

You could check if a column is either non-numeric or contains the min.

This approach is efficient as it first computes the min per column, then compares each min to the global min.

from pandas.api.types import is_numeric_dtype

# non-numeric
mask = In.apply(is_numeric_dtype)
# contains min
m = (m:=In.min()).eq(m[mask].min()) | ~mask

Out = In.loc[:,m]

Output:

   A   B   D
0  W  13  45
1  X  23  12
2  Y  12  56

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM