[英]Fastest way to get Percent of rows with incorrect values for each feature in a Pandas Dataframe
Below code is what I have.下面的代码是我所拥有的。 Seems to work for
?, '
and ''
but not for np.NaN
.似乎适用于
?, '
和''
但不适用于np.NaN
。 Any suggestions?有什么建议?
Also, I am new to Pandas/Python and hence would like to know if there is a faster way to do this另外,我是 Pandas/Python 的新手,因此想知道是否有更快的方法来做到这一点
I am thinking of treating features as suspect if more than X%(say 5%) of the rows have missing values.如果超过 X%(比如 5%)的行有缺失值,我正在考虑将特征视为可疑。 Any other data sanitization initial checks that you regularly use
您经常使用的任何其他数据清理初始检查
for col in df.columns:
pcnt_missing = df[df[col].isin(['?','',' ',np.NaN])][col].count() * 100.0 / df[col].count()
if pcnt_missing > 1:
print(f"Col = {col}, Percent missing ={pcnt_missing:.2f}")
If you can replace the values ?
如果可以替换值
?
, ''
, and ' '
with np.nan
, you can easily compute the percentage of missing values by using the sum
and the length of the DataFrame. 、
''
和' '
使用np.nan
,您可以使用数据np.nan
的sum
和长度轻松计算缺失值的百分比。 You can replace the missing values with an apply
:您可以使用
apply
替换缺失值:
import pandas as pd
import numpy as np
df = pd.DataFrame({'a': [1,2,3,4], 'b': [2, '', '?', 4], 'c': [' ', np.nan, '', 5]})
def replace(x):
idx = x.isin(['', ' ', '?'])
x[idx] = np.nan
return x
replaced = df.apply(replace, axis=1) % Values are replaced here
Now you can compute the percentage of missing values for each column with this:现在,您可以使用以下命令计算每列缺失值的百分比:
replaced.isna().sum(axis=0) * 100 / len(replaced)
Output:
a 0.0
b 50.0
c 75.0
dtype: float64
Use boolean logic with isna
, using @Ricardo Erikson setup:使用布尔逻辑与
isna
,使用@Ricardo埃里克森设置:
df = pd.DataFrame({'a': [1,2,3,4], 'b': [2, '', '?', 4], 'c': [' ', np.nan, '', 5]})
(df.isna() | df.isin(['?','',' '])).mean()
Output:输出:
a 0.00
b 0.50
c 0.75
dtype: float64
Check for NaN with isna
and use |
使用
isna
检查 NaN 并使用|
, OR boolean operator, and the use isin
, plus you can use mean
to find the percentage missing. , OR 布尔运算符,并使用
isin
,另外您可以使用mean
来查找缺失的百分比。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.