简体   繁体   English

Pandas - 如何用 NaN 替换这些异常值

[英]Pandas - How to replace those exception values with NaN

There is a DataFrame like this:有一个像这样的DataFrame

           cost
0   8762.000000
1   -1
2   7276.000000
3   9574.000000
4   -1
..          ...
59  5508.000000
60  7193.750000
61  5927.333333
62  -1
63  4972.000000

The -1 is the exception value in this case, so how to replace -1 with NaN . -1在这种情况下是异常值,因此如何将-1替换为NaN And then how to interpolate NaN for replacement.然后如何插入 NaN 进行替换。

After that, the DataFrame was cleaned.But there may be some abnormal high and low values of the DataFrame , and then how to interpolate abnormal high and low values for replacement.之后,清理了DataFrame 。但是DataFrame可能有一些异常的高低值,然后如何插入异常高低值进行替换。

For replace -1 to interpolate values use replacement to NaN s with Series.interpolate :要替换-1以插入值,请使用Series.interpolate替换NaN s:

df['cost'] = df['cost'].replace(-1, np.nan).interpolate()

If need remove also outliers (abnormal high and low values) you can identify them by Series.quantile and Series.between and replace them to NaN s in Series.where (first replace -1 ):如果还需要删除异常值(异常高值和低值),您可以通过Series.quantileSeries.between识别它们,并将它们替换为Series.where中的NaN (首先替换-1 ):

print (df)
             cost
0     8762.000000
1       -1.000000
2     7276.000000
3   957400.000000
4       -1.000000
59    5508.000000
60    7193.750000
61      59.333333
62      -1.000000
63    4972.000000

df['cost'] = df['cost'].replace(-1, np.nan)

q_low = df["cost"].quantile(0.01)
q_hi  = df["cost"].quantile(0.99)

m = df["cost"].between(q_low, q_hi, inclusive=False)

df['cost'] = df['cost'].where(m).interpolate()
print (df)
           cost
0   8762.000000
1   8019.000000
2   7276.000000
3   6686.666667
4   6097.333333
59  5508.000000
60  7193.750000
61  6453.166667
62  5712.583333
63  4972.000000

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM