xarray 中是否有内置函数可以从数据集中删除异常值？

Question

I have a spatio-temporal .nc file that I opened as a xarray dataset and I would like to remove the values that exceeds the 99th percentile.我有一个作为 xarray 数据集打开的时空 .nc 文件，我想删除超过 99% 的值。 Is there any easy/straight way to drop those values?有没有简单/直接的方法来删除这些值？

The information abour my Dataset is关于我的数据集的信息是

Dimensions:    (latitude: 204, longitude: 180, time: 985)
Coordinates:
  * longitude  (longitude) float32 -69.958336 -69.875 ... -55.124996 -55.04166
  * latitude   (latitude) float32 -38.041668 -38.12501 ... -54.87501 -54.95834
  * time       (time) datetime64[ns] 1997-09-06 1997-09-14 ... 2019-09-06
Data variables:
    chl        (time, latitude, longitude) float64 nan nan nan ... nan nan nan

Answer 1

You can create your own function您可以创建自己的函数

import xarray as xr
import numpy as np

# perc -> percentile that define the exclusion threshold 
# dim -> dimension to which apply the filtering

def replace_outliers(data, dim=0, perc=0.99):

  # calculate percentile 
  threshold = data[dim].quantile(perc)

  # find outliers and replace them with max among remaining values 
  mask = data[dim].where(abs(data[dim]) <= threshold)
  max_value = mask.max().values
  # .where replace outliers with nan
  mask = mask.fillna(max_value)
  print(mask)
  data[dim] = mask

  return data

Testing测试

data = np.random.randint(1,5,[3, 3, 3])
# create outlier 
data[0,0,0] = 100

temp = xr.DataArray(data.copy())

print(temp[0])

Out:出去：

array([[100,   1,   2],
       [  4,   4,   4],
       [  1,   4,   3]])

Apply function:应用功能：

temp = replace_outliers(temp, dim=0, perc=99)
print(temp[0])

Out:出去：

array([[[4, 1, 2],
        [4, 4, 4],
        [1, 4, 3]],

xarray 中是否有内置函数可以从数据集中删除异常值？

问题描述

1 个解决方案

解决方案1
1 2020-03-23 17:11:18

xarray 中是否有内置函数可以从数据集中删除异常值？

问题描述

1 个解决方案

解决方案1 1 2020-03-23 17:11:18

解决方案1
1 2020-03-23 17:11:18