[英]Pandas - How to replace those exception values with NaN
There is a DataFrame
like this:有一个像这样的DataFrame
:
cost
0 8762.000000
1 -1
2 7276.000000
3 9574.000000
4 -1
.. ...
59 5508.000000
60 7193.750000
61 5927.333333
62 -1
63 4972.000000
The -1
is the exception value in this case, so how to replace -1
with NaN
. -1
在这种情况下是异常值,因此如何将-1
替换为NaN
。 And then how to interpolate NaN for replacement.然后如何插入 NaN 进行替换。
After that, the DataFrame
was cleaned.But there may be some abnormal high and low values of the DataFrame
, and then how to interpolate abnormal high and low values for replacement.之后,清理了DataFrame
。但是DataFrame
可能有一些异常的高低值,然后如何插入异常高低值进行替换。
For replace -1
to interpolate values use replacement to NaN
s with Series.interpolate
:要替换-1
以插入值,请使用Series.interpolate
替换NaN
s:
df['cost'] = df['cost'].replace(-1, np.nan).interpolate()
If need remove also outliers (abnormal high and low values) you can identify them by Series.quantile
and Series.between
and replace them to NaN
s in Series.where
(first replace -1
):如果还需要删除异常值(异常高值和低值),您可以通过Series.quantile
和Series.between
识别它们,并将它们替换为Series.where
中的NaN
(首先替换-1
):
print (df)
cost
0 8762.000000
1 -1.000000
2 7276.000000
3 957400.000000
4 -1.000000
59 5508.000000
60 7193.750000
61 59.333333
62 -1.000000
63 4972.000000
df['cost'] = df['cost'].replace(-1, np.nan)
q_low = df["cost"].quantile(0.01)
q_hi = df["cost"].quantile(0.99)
m = df["cost"].between(q_low, q_hi, inclusive=False)
df['cost'] = df['cost'].where(m).interpolate()
print (df)
cost
0 8762.000000
1 8019.000000
2 7276.000000
3 6686.666667
4 6097.333333
59 5508.000000
60 7193.750000
61 6453.166667
62 5712.583333
63 4972.000000
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.