The intention is to summarize the Duration column of the datatable by applying: sum, max, mode, min, and count The mode function I use is the one shown in How to find the statistical mode? and the one on the package DescTools.
the data used
library(data.table)
dt<-data.table(
stringsAsFactors = FALSE,
ODUFault = c("NO","SI","NO","SI","NO",
"SI","NO","SI","NO","SI","NO","SI","NO","SI","NO",
"SI","NO","SI","NO","SI"),
LastFault = c("sA","sB","sB","sB","sB",
"sB","sB","sB","sB","sB","sB","sC","sC","sB","sB",
"sB","sB","sB","sB","sB"),
SubFlt = c("A","B","B","B","B","B",
"B","B","B","B","B","C","C","B","B","B","B","B",
"B","B"),
Duration = c("00:09:40","00:03:01",
"00:06:58","00:03:00","00:06:59","00:03:00","00:06:58",
"00:03:01","00:06:59","00:02:59","00:07:29","00:03:01",
"00:06:29","00:05:03","00:04:56","00:03:00","00:06:59",
"00:02:59","00:07:00","00:15:33")
)
When performing the summary using the median function, all outputs have the format: "H: M: S"
dt[, Duration:=as.ITime(Duration)]
Summarize_SubFlt=dt[,list(g = sum(Duration),m=max(Duration),md=median(Duration),n=min(Duration),c=.N),by=.(ODUFault,SubFlt)][
which(ODUFault =="SI"), .SD, by=SubFlt]
SubFlt ODUFault g m md n c
1: B SI 00:41:36 00:15:33 00:03:00 00:02:59 9
2: C SI 00:03:01 00:03:01 00:03:01 00:03:01 1
When using the mode function, all outputs lose the format: "H: M: S", except the output of the mode function.
getmode <- function(v) {
uniqv <- unique(v)
uniqv[which.max(tabulate(match(v, uniqv)))]
}
Summarize_SubFlt2=dt[,list(g = sum(Duration),m=max(Duration),md=getmode(Duration),n=min(Duration),c=.N),by=.(ODUFault,SubFlt)][
which(ODUFault =="SI"), .SD, by=SubFlt]
SubFlt ODUFault g m md n c
1: B SI 2496 933 00:03:00 179 9
2: C SI 181 181 00:03:01 181 1
#Structure of Summarize_SubFlt
Classes ‘data.table’ and 'data.frame': 2 obs. of 7 variables:
$ SubFlt : chr "B" "C"
$ ODUFault: chr "SI" "SI"
$ g : 'ITime' int 00:41:36 00:03:01
$ m : 'ITime' int 00:15:33 00:03:01
$ md : 'ITime' num 00:03:00 00:03:01
$ n : 'ITime' int 00:02:59 00:03:01
$ c : int 9 1
- attr(*, ".internal.selfref")=<externalptr>
#Structure of Summarize_SubFlt2
Classes ‘data.table’ and 'data.frame': 2 obs. of 7 variables:
$ SubFlt : chr "B" "C"
$ ODUFault: chr "SI" "SI"
$ g : int 2496 181
$ m : int 933 181
$ md : 'ITime' int 00:03:00 00:03:01
$ n : int 179 181
$ c : int 9 1
- attr(*, ".internal.selfref")=<externalptr>
#Structure of Summarize_SubFlt3 using Mode from library(DescTools)
Classes ‘data.table’ and 'data.frame': 2 obs. of 7 variables:
$ SubFlt : chr "B" "C"
$ ODUFault: chr "SI" "SI"
$ g : 'ITime' int 00:41:36 00:03:01
$ m : 'ITime' int 00:15:33 00:03:01
$ md : 'ITime' num 00:03:00 00:03:01
$ n : 'ITime' int 00:02:59 00:03:01
$ c : int 9 1
- attr(*, ".internal.selfref")=<externalptr>
How to keep the Format "H% M% S" for all summary outputs?
Here's a possible workaround to calculate mode separately and join to the table.
library(data.table)
dt[, Duration:=as.ITime(Duration)]
dt1 <- dt[,.(g = sum(Duration),
m=max(Duration),
med =median(Duration),
n=min(Duration),
c=.N),
by=.(ODUFault,SubFlt)][
which(ODUFault =="SI"), .SD, by=SubFlt]
dt2 <- dt[, .(mode = getmode(dt$Duration)), by=.(ODUFault,SubFlt)][
which(ODUFault =="SI"), .SD, by=SubFlt]
dt1[dt2, on = .(ODUFault,SubFlt)]
# SubFlt ODUFault g m medd n c mode
#1: B SI 00:41:36 00:15:33 00:03:00 00:02:59 9 00:03:01
#2: C SI 00:03:01 00:03:01 00:03:01 00:03:01 1 00:03:01
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.