[英]Pandas- How to drop a row or column if they have a certain value most of the times?
I have a Dataframe where i have some missing values as "none". 我有一个数据框,其中有一些缺少的值“无”。
import pandas as pd df = pd.DataFrame ({'Category': (['none',''women','kids']), 'Sales': (['none','none','40']), '# of customers': (['30','none','50']) })
I want to remove the rows or columns that have most values as 'none'. 我想删除值最多的行或列为“ none”。 How to do this?
这个怎么做? Thank you
谢谢
1st solution is treat none as character not NaN , the we using eq
with sum
(if need drop row
using sum(axis=1)
) 第一种解决方案不将其视为字符而不是NaN,我们将
eq
与sum
一起使用(如果需要使用sum(axis=1)
删除row
)
df.loc[:,df.eq('none').sum().lt(2)]
Out[559]:
# of customers Category
0 30 none
1 none women
2 50 kids
2nd solution is assuming your none as np.nan
and using dropna
with thresh
np.nan
解决方案是假设您的都不是np.nan
并使用带有thresh
dropna
#df=df.replace('none',np.nan)
df.dropna(axis=0,thresh=2)#here thresh is Require that many non-NA values.
Out[563]:
# of customers Category Sales
2 50 kids 40
Or: 要么:
df.loc[:,(df=='none').sum()<=1]
Output: 输出:
# of customers Category
0 30 none
1 none women
2 50 kids
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.