I have a Dataframe where i have some missing values as "none".
import pandas as pd df = pd.DataFrame ({'Category': (['none',''women','kids']), 'Sales': (['none','none','40']), '# of customers': (['30','none','50']) })
I want to remove the rows or columns that have most values as 'none'. How to do this? Thank you
1st solution is treat none as character not NaN , the we using eq
with sum
(if need drop row
using sum(axis=1)
)
df.loc[:,df.eq('none').sum().lt(2)]
Out[559]:
# of customers Category
0 30 none
1 none women
2 50 kids
2nd solution is assuming your none as np.nan
and using dropna
with thresh
#df=df.replace('none',np.nan)
df.dropna(axis=0,thresh=2)#here thresh is Require that many non-NA values.
Out[563]:
# of customers Category Sales
2 50 kids 40
Or:
df.loc[:,(df=='none').sum()<=1]
Output:
# of customers Category
0 30 none
1 none women
2 50 kids
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.