[英]How to plot outliers and original series?
Hi I would like to define a function which returns a plot for outlier(defined below) based on a specified date range
and simultaneously plots the original series(and accounts in that context for possible ratios): 嗨,我想定义一个函数,该函数根据
specified date range
返回异常值(以下定义)的图,并同时绘制原始序列(并在该上下文中说明可能的比率):
Defing outliers: 防御异常值:
anomaly <- function(x)
{ tt <- 1:length(x)
resid <- residuals(loess(x ~ tt))
resid.q <- quantile(resid,prob=c(0.25,0.75))
iqr <- diff(resid.q)
limits <- resid.q + 1.5*iqr*c(-1,1)
score <- abs(pmin((resid-limits[1])/iqr,0) + pmax((resid - limits[2])/iqr,0))
return(score)
}
# defining dates
dates <- as.POSIXct(seq(as.Date("2015-08-20"), as.Date("2015-10-08"), by = "days"))
Some data: 一些数据:
a<-runif(50, 5.0, 7.5)
b<-runif(50, 4, 8)
c<-runif(50, 1, 2)
d<-runif(50, 3, 3.5)
ca<-c/a
cb<-c/b
df<-data.frame(dates,a,b,c,d,ca,cb)
Introducing outlier 引入异常值
df[49,4]<-0
df[50,6]<-0
Loop over the data to find anomalies 遍历数据以查找异常
new<-lapply(df[,2:7],anomaly)
library(stringi) # binding list with differing rows
# from list to data frame
res <- as.data.frame((stri_list2matrix(new)))
# rename columns
colnames(res) <- names(new)
# depends on dates at the beginning
res<-(cbind(dates,res[,1:6]))
# melt to plot
library(reshape)
library(reshape2)
new <- melt(res , id.vars = 'dates', variable.name = 'series')
Defing plot with a specified date range
(last 4 days): 按指定的
date range
(过去4天)进行防御:
library(ggplot2)
nrdays <- 4
a.plot<-ggplot(subset(new, new$dates >= as.POSIXct(max(new$dates)- (nrdays*60*60*24))),
aes(x=dates,y=value,colour=variable,group=variable)) +
geom_line() +
facet_grid(variable ~ ., scales = "free_y")+
ylab("Outliers")+
xlab("Date")
Defining check data function: 定义检查数据功能:
check_data <- function(df) {
if(tail(df, 1) > 0) { # check only last date
return(a.plot)
# and the corresponding original series
}
}
# check and plot data
check_data(df)
My problem is that I have hundreds of features and I would like only plot those where a outlier
has happened. 我的问题是我有数百个功能,并且我只想绘制发生
outlier
那些功能。 As you can see in the graph, I'm able to come up with a plot which returns all time series including the series with the outlier rather those where only the outlier
took place. 如您在图表中所见,我能够绘制出一个图表,该图表返回所有时间序列,包括具有异常值的序列,而不是仅发生
outlier
时间序列。 Additionally, I would like to report the original series as well(including ratios
, that is, given an outlier in the ratio ca
I would like to get the original series c
and a
too)...how may I approach that problem. 此外,我想报原系列以及(包括
ratios
,即给定一个离群的比例ca
我想获得原始的系列c
和a
太)...怎么可能我走近这个问题。 So the output may look like that: 所以输出看起来像这样:
including original series:
and the outlier as well:
you need to specify in subset
that you want only outliers, the one not equal to 0. so you can replace 您需要在
subset
指定只需要离群值,一个不等于0。因此您可以替换
a.plot<-ggplot(subset(new, new$dates >= as.POSIXct(max(new$dates)- (nrdays*60*60*24)) & new$variable %in% new$variable[!new$value %in% 0 & new$dates >= as.POSIXct(max(new$dates)- (nrdays*60*60*24))]),
aes(x=dates,y=value,colour=variable,group=variable)) +
geom_line() +
facet_grid(variable ~ ., scales = "free_y")+
ylab("Outliers")+
xlab("Date")
This should help. 这应该有所帮助。 Also you can clean it a bit so it is more readable
您也可以清洁一下,使其更易读
Another option would be to join original data and outliers and plot them together. 另一种选择是将原始数据和离群值合并在一起,并将它们绘制在一起。 First you create a data.frame, then subset and pass it to ggplot.
首先,创建一个data.frame,然后创建子集并将其传递给ggplot。 So after yours loop over the data you can do something like this
因此,在遍历数据之后,您可以执行以下操作
orig <- melt(df , id.vars = 'dates', variable.name = 'series')
data.df <- merge(new, orig, by = c("dates", "variable"))
colnames(data.df)[2:4] <- c("group","index", "original")
data.df$index <- as.numeric(as.character(data.df$index)) # replace factor with numeric
nrdays <- 4
data.subs <- subset(data.df, data.df$dates >= as.POSIXct(max(data.df$dates)- (nrdays*60*60*24)) &
data.df$group %in% data.df$group[!data.df$index %in% 0 & data.df$dates >= as.POSIXct(max(data.df$dates)- (nrdays*60*60*24))])
data.subs <- melt(data.subs, id = c('dates', "group"))
a.plot<-ggplot(data.subs)+
geom_line(aes(x=dates,y=value, colour = variable, group = variable))+
facet_grid(group ~ ., scales = "free_y")+
ylab("Outliers")+
xlab("Date")
a.plot
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.