简体   繁体   中英

Filter columns with number of unique values in a pandas dataframe

I have a very large dataframe with over 2000 columns. I am trying to count the number of unique values for each column and filter out the columns with unique values below a certain number. Here is an example:

import pandas as pd
df = pd.DataFrame({'A': ('a', 'b', 'c', 'd', 'e', 'a', 'a'), 'B': (1, 1, 2, 1, 3, 3, 1)})
df.nunique()
A      5
B      3
dtype: int64

So lets say I wanna filter out column B which has lower than 5 unique values and return a df without column B.

Thanks-

Pass the .loc

df=df.loc[:,df.nunique()>3]
   A
0  a
1  b
2  c
3  d
4  e
5  a
6  a

Others may have a more pythonic way. Try this out to see if it works.

x = df.nunique()
df[list(x[x>=5].index)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM