简体   繁体   中英

Evaluate mode function on column with time-of-day class stored data in R data.table

The intention is to summarize the Duration column of the datatable by applying: sum, max, mode, min, and count The mode function I use is the one shown in How to find the statistical mode? and the one on the package DescTools.

the data used

library(data.table)
dt<-data.table(
  stringsAsFactors = FALSE,
  ODUFault = c("NO","SI","NO","SI","NO",
               "SI","NO","SI","NO","SI","NO","SI","NO","SI","NO",
               "SI","NO","SI","NO","SI"),
  LastFault = c("sA","sB","sB","sB","sB",
                "sB","sB","sB","sB","sB","sB","sC","sC","sB","sB",
                "sB","sB","sB","sB","sB"),
  SubFlt = c("A","B","B","B","B","B",
             "B","B","B","B","B","C","C","B","B","B","B","B",
             "B","B"),
  Duration = c("00:09:40","00:03:01",
               "00:06:58","00:03:00","00:06:59","00:03:00","00:06:58",
               "00:03:01","00:06:59","00:02:59","00:07:29","00:03:01",
               "00:06:29","00:05:03","00:04:56","00:03:00","00:06:59",
               "00:02:59","00:07:00","00:15:33")
)

When performing the summary using the median function, all outputs have the format: "H: M: S"

dt[, Duration:=as.ITime(Duration)]
Summarize_SubFlt=dt[,list(g = sum(Duration),m=max(Duration),md=median(Duration),n=min(Duration),c=.N),by=.(ODUFault,SubFlt)][
  which(ODUFault =="SI"), .SD, by=SubFlt]

   SubFlt ODUFault        g        m       md        n c
1:      B       SI 00:41:36 00:15:33 00:03:00 00:02:59 9
2:      C       SI 00:03:01 00:03:01 00:03:01 00:03:01 1

When using the mode function, all outputs lose the format: "H: M: S", except the output of the mode function.

getmode <- function(v) {
  uniqv <- unique(v)
  uniqv[which.max(tabulate(match(v, uniqv)))]
}
Summarize_SubFlt2=dt[,list(g = sum(Duration),m=max(Duration),md=getmode(Duration),n=min(Duration),c=.N),by=.(ODUFault,SubFlt)][
  which(ODUFault =="SI"), .SD, by=SubFlt]
   
SubFlt ODUFault    g   m       md   n c
1:      B       SI 2496 933 00:03:00 179 9
2:      C       SI  181 181 00:03:01 181 1
#Structure of Summarize_SubFlt
Classes ‘data.table’ and 'data.frame':  2 obs. of  7 variables:
 $ SubFlt  : chr  "B" "C"
 $ ODUFault: chr  "SI" "SI"
 $ g       : 'ITime' int  00:41:36 00:03:01
 $ m       : 'ITime' int  00:15:33 00:03:01
 $ md      : 'ITime' num  00:03:00 00:03:01
 $ n       : 'ITime' int  00:02:59 00:03:01
 $ c       : int  9 1
 - attr(*, ".internal.selfref")=<externalptr>

#Structure of Summarize_SubFlt2
Classes ‘data.table’ and 'data.frame':  2 obs. of  7 variables:
 $ SubFlt  : chr  "B" "C"
 $ ODUFault: chr  "SI" "SI"
 $ g       : int  2496 181
 $ m       : int  933 181
 $ md      : 'ITime' int  00:03:00 00:03:01
 $ n       : int  179 181
 $ c       : int  9 1
 - attr(*, ".internal.selfref")=<externalptr> 

#Structure of Summarize_SubFlt3 using Mode from library(DescTools)
Classes ‘data.table’ and 'data.frame':  2 obs. of  7 variables:
 $ SubFlt  : chr  "B" "C"
 $ ODUFault: chr  "SI" "SI"
 $ g       : 'ITime' int  00:41:36 00:03:01
 $ m       : 'ITime' int  00:15:33 00:03:01
 $ md      : 'ITime' num  00:03:00 00:03:01
 $ n       : 'ITime' int  00:02:59 00:03:01
 $ c       : int  9 1
 - attr(*, ".internal.selfref")=<externalptr> 

How to keep the Format "H% M% S" for all summary outputs?

Here's a possible workaround to calculate mode separately and join to the table.

library(data.table)

dt[, Duration:=as.ITime(Duration)]

dt1 <- dt[,.(g = sum(Duration),
         m=max(Duration),
         med =median(Duration),
         n=min(Duration),
         c=.N),
   by=.(ODUFault,SubFlt)][
     which(ODUFault =="SI"), .SD, by=SubFlt]

dt2 <- dt[, .(mode = getmode(dt$Duration)), by=.(ODUFault,SubFlt)][
  which(ODUFault =="SI"), .SD, by=SubFlt]

dt1[dt2, on = .(ODUFault,SubFlt)]

#   SubFlt ODUFault        g        m     medd        n c     mode
#1:      B       SI 00:41:36 00:15:33 00:03:00 00:02:59 9 00:03:01
#2:      C       SI 00:03:01 00:03:01 00:03:01 00:03:01 1 00:03:01

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM