The problem that I am facing is how i can reject a window of 10 rows if one or many of the rows consist of an outlier while computing rolling average using python pandas? The assistance i require in is the conditional logic based on the following scenarios mentioned below
The condition on the outlier in a window is:
The upper bound for outlier is 15, the lower bound is 0
if the frequency of occurrence of outlier in a window is greater than 10%, we reject that particular window and move next.
Here's the following code till now:
_filter = lambda x: float("inf") if x > 15 or x < 0 else x
#Apply the mean over window with inf to result those values in
result = df_list["speed"].apply(_filter).rolling(10).mean().dropna()
#Print Max rolling average
print("The max rolling average is:")
result.max()
Use rolling
with a custom aggregation function:
df = pd.DataFrame({"a": range(100), "speed": np.random.randint(0, 17, 100)})
MAX = 15
MIN = 0
def my_mean(s):
outlier_count = ((s<MIN) | (s > MAX)).sum()
if outlier_count > 2: # defined 2 as the threshold - can put any other number here
return np.NaN
res = s[(s <= MAX) & (s >= MIN)].mean()
return res
df["roll"] = df.speed.rolling(10).apply(my_mean)
This results, in one example, in:
...
35 35 8 9.444444
36 36 14 9.666667
37 37 11 9.888889
38 38 16 10.250000
39 39 16 NaN
40 40 15 NaN
41 41 6 NaN
42 42 9 11.375000
43 43 2 10.000000
44 44 8 9.125000
...
What happens here is as follows:
df.speed.rolling(10)
)my_mean
.my_mean
first counts the number of outliers, by summing the number of cases in which elements in the series s
are smaller than the minimum or larger that the maximum. s[(s <= MAX) & (s >= MIN)].mean()
).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.