简体   繁体   中英

Pandas- How to drop a row or column if they have a certain value most of the times?

I have a Dataframe where i have some missing values as "none".

import pandas as pd df = pd.DataFrame ({'Category': (['none',''women','kids']), 'Sales': (['none','none','40']), '# of customers': (['30','none','50']) })

I want to remove the rows or columns that have most values as 'none'. How to do this? Thank you

1st solution is treat none as character not NaN , the we using eq with sum (if need drop row using sum(axis=1) )

df.loc[:,df.eq('none').sum().lt(2)]
Out[559]: 
  # of customers Category
0             30     none
1           none    women
2             50     kids

2nd solution is assuming your none as np.nan and using dropna with thresh

#df=df.replace('none',np.nan)

df.dropna(axis=0,thresh=2)#here thresh is Require that many non-NA values.
Out[563]: 
  # of customers Category Sales
2             50     kids    40

Or:

df.loc[:,(df=='none').sum()<=1]

Output:

  # of customers Category
0             30     none
1           none    women
2             50     kids

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM