如何在python中提取95％的数据

Question

Given an array of numbers, I would like to drop outliers while preserving 95% of the total number of datapoints. 给定一个数字数组，我想删除异常值，同时保留95％的数据点总数。 Eg range(0,100,1) would become range(2,98,1). 例如range（0,100,1）将成为range（2,98,1）。

For example if the data is something like 例如，如果数据类似于

[0.01,0.02,4,5,7,3,1,4,6,7,10000,10002] -> [4,5,7,3,1,4,6,7]

Is there any function in the Python standard library or Numpy for this purpose? Python标准库或Numpy中是否有用于此目的的函数？

Answer 1

It sounds like you're interested in filtering out data that's within 95% of the median absolute deviation , or MAD. 听起来您有兴趣筛选出在中位数绝对偏差（MAD）的95％以内的数据。

The MAD of this dataset is 2.5 (whereas the std deviation is >3000). 此数据集的MAD为2.5（而std偏差> 3000）。 We can use this to filter points that are more than 2 median deviations away (collecting approx ~95%) 我们可以使用它来过滤相距2个中间偏差以上的点（收集约95％）

import numpy as np

data = np.array([0.01,0.02,4,5,7,3,1,4,6,7,10000,10002])
deviations = 2

d = np.abs(data - np.median(data))
med_abs_dev = np.median(d)
s = d / med_abs_dev
filtered = data[s < deviations]
# [ 0.01  0.02  4.    5.    7.    3.    1.    4.    6.    7.  ]

如何在python中提取95％的数据

问题描述

1 个解决方案

解决方案1
0 已采纳 2017-03-28 05:58:45

如何在python中提取95％的数据

问题描述

1 个解决方案

解决方案1 0 已采纳 2017-03-28 05:58:45

解决方案1
0 已采纳 2017-03-28 05:58:45