简体   繁体   中英

Get column names with distinct value greater than specified values python

Dataframe X:

A   B    C    D
V1  V2   V3   V4
V1  V3   V4   V5
V1  V4   V5   V5
V1  V5   V9   V5
V1  V2   V3   V4
V1  V10  V11  V12
V1  V10  V6   V8
V1  V12  V7   V8

Here Col A has 1 unique value, Col B has 6 unique values, Col C has 7 unique values, Col D has 4 unique values.

I need a list of all columns where unique values > 4 say.

X.columns[(X.nunique() > 4).any()]

I expect to get only col B and Col C here, but I get all columns. How to achieve desired output.

You are really close, only remove .any for boolean mask:

c = X.columns[(X.nunique() > 4)]
print (c)
Index(['B', 'C'], dtype='object')

If need select columns use DataFrame.loc :

df = X.loc[:, (X.nunique() > 4)]
print (df)
     B    C
0   V2   V3
1   V3   V4
2   V4   V5
3   V5   V9
4   V2   V3
5  V10  V11
6  V10   V6
7  V12   V7

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM