簡體   English   中英

使用 R data.table 中的時間類存儲數據評估列上的模式函數

[英]Evaluate mode function on column with time-of-day class stored data in R data.table

目的是通過應用總結數據表的 Duration 列:sum、max、mode、min 和 count 我使用的模式函數是如何找到統計模式? 以及 DescTools 包上的那個。

使用的數據

library(data.table)
dt<-data.table(
  stringsAsFactors = FALSE,
  ODUFault = c("NO","SI","NO","SI","NO",
               "SI","NO","SI","NO","SI","NO","SI","NO","SI","NO",
               "SI","NO","SI","NO","SI"),
  LastFault = c("sA","sB","sB","sB","sB",
                "sB","sB","sB","sB","sB","sB","sC","sC","sB","sB",
                "sB","sB","sB","sB","sB"),
  SubFlt = c("A","B","B","B","B","B",
             "B","B","B","B","B","C","C","B","B","B","B","B",
             "B","B"),
  Duration = c("00:09:40","00:03:01",
               "00:06:58","00:03:00","00:06:59","00:03:00","00:06:58",
               "00:03:01","00:06:59","00:02:59","00:07:29","00:03:01",
               "00:06:29","00:05:03","00:04:56","00:03:00","00:06:59",
               "00:02:59","00:07:00","00:15:33")
)

使用中值函數執行匯總時,所有輸出的格式為:“H: M: S”

dt[, Duration:=as.ITime(Duration)]
Summarize_SubFlt=dt[,list(g = sum(Duration),m=max(Duration),md=median(Duration),n=min(Duration),c=.N),by=.(ODUFault,SubFlt)][
  which(ODUFault =="SI"), .SD, by=SubFlt]

   SubFlt ODUFault        g        m       md        n c
1:      B       SI 00:41:36 00:15:33 00:03:00 00:02:59 9
2:      C       SI 00:03:01 00:03:01 00:03:01 00:03:01 1

使用模式功能時,所有輸出都失去格式:“H:M:S”,模式功能的輸出除外。

getmode <- function(v) {
  uniqv <- unique(v)
  uniqv[which.max(tabulate(match(v, uniqv)))]
}
Summarize_SubFlt2=dt[,list(g = sum(Duration),m=max(Duration),md=getmode(Duration),n=min(Duration),c=.N),by=.(ODUFault,SubFlt)][
  which(ODUFault =="SI"), .SD, by=SubFlt]
   
SubFlt ODUFault    g   m       md   n c
1:      B       SI 2496 933 00:03:00 179 9
2:      C       SI  181 181 00:03:01 181 1
#Structure of Summarize_SubFlt
Classes ‘data.table’ and 'data.frame':  2 obs. of  7 variables:
 $ SubFlt  : chr  "B" "C"
 $ ODUFault: chr  "SI" "SI"
 $ g       : 'ITime' int  00:41:36 00:03:01
 $ m       : 'ITime' int  00:15:33 00:03:01
 $ md      : 'ITime' num  00:03:00 00:03:01
 $ n       : 'ITime' int  00:02:59 00:03:01
 $ c       : int  9 1
 - attr(*, ".internal.selfref")=<externalptr>

#Structure of Summarize_SubFlt2
Classes ‘data.table’ and 'data.frame':  2 obs. of  7 variables:
 $ SubFlt  : chr  "B" "C"
 $ ODUFault: chr  "SI" "SI"
 $ g       : int  2496 181
 $ m       : int  933 181
 $ md      : 'ITime' int  00:03:00 00:03:01
 $ n       : int  179 181
 $ c       : int  9 1
 - attr(*, ".internal.selfref")=<externalptr> 

#Structure of Summarize_SubFlt3 using Mode from library(DescTools)
Classes ‘data.table’ and 'data.frame':  2 obs. of  7 variables:
 $ SubFlt  : chr  "B" "C"
 $ ODUFault: chr  "SI" "SI"
 $ g       : 'ITime' int  00:41:36 00:03:01
 $ m       : 'ITime' int  00:15:33 00:03:01
 $ md      : 'ITime' num  00:03:00 00:03:01
 $ n       : 'ITime' int  00:02:59 00:03:01
 $ c       : int  9 1
 - attr(*, ".internal.selfref")=<externalptr> 

如何為所有摘要輸出保留格式“H% M% S”?

這是單獨計算模式並加入表格的可能解決方法。

library(data.table)

dt[, Duration:=as.ITime(Duration)]

dt1 <- dt[,.(g = sum(Duration),
         m=max(Duration),
         med =median(Duration),
         n=min(Duration),
         c=.N),
   by=.(ODUFault,SubFlt)][
     which(ODUFault =="SI"), .SD, by=SubFlt]

dt2 <- dt[, .(mode = getmode(dt$Duration)), by=.(ODUFault,SubFlt)][
  which(ODUFault =="SI"), .SD, by=SubFlt]

dt1[dt2, on = .(ODUFault,SubFlt)]

#   SubFlt ODUFault        g        m     medd        n c     mode
#1:      B       SI 00:41:36 00:15:33 00:03:00 00:02:59 9 00:03:01
#2:      C       SI 00:03:01 00:03:01 00:03:01 00:03:01 1 00:03:01

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM