Pandas- How to drop a row or column if they have a certain value most of the times?

Question

I have a Dataframe where i have some missing values as "none".

import pandas as pd df = pd.DataFrame ({'Category': (['none',''women','kids']), 'Sales': (['none','none','40']), '# of customers': (['30','none','50']) })

I want to remove the rows or columns that have most values as 'none'. How to do this? Thank you

Answer 1

1st solution is treat none as character not NaN , the we using eq with sum (if need drop row using sum(axis=1) )

df.loc[:,df.eq('none').sum().lt(2)]
Out[559]: 
  # of customers Category
0             30     none
1           none    women
2             50     kids

2nd solution is assuming your none as np.nan and using dropna with thresh

#df=df.replace('none',np.nan)

df.dropna(axis=0,thresh=2)#here thresh is Require that many non-NA values.
Out[563]: 
  # of customers Category Sales
2             50     kids    40

Answer 2

Or:

df.loc[:,(df=='none').sum()<=1]

Output:

  # of customers Category
0             30     none
1           none    women
2             50     kids

Pandas- How to drop a row or column if they have a certain value most of the times?

Question

2 answers

solution1
0 ACCPTED 2018-10-09 02:03:51

solution2
0 2018-10-09 02:08:16

Pandas- How to drop a row or column if they have a certain value most of the times?

Question

2 answers

solution1 0 ACCPTED 2018-10-09 02:03:51

solution2 0 2018-10-09 02:08:16

solution1
0 ACCPTED 2018-10-09 02:03:51

solution2
0 2018-10-09 02:08:16