[英]dplyr summarise character time variable
I have data that looks like this我有看起来像这样的数据
ID Time
456 0:00:01
456 0:02:05
123 0:00:14
756 0:03:47
756 0:01:56
756 0:00:01
where Time is typeof = character其中时间是 typeof = 字符
I need to sum the Time column by ID so I end up with:我需要按 ID 对 Time 列求和,所以我最终得到:
ID Time Total_Time
456 0:00:01 0:02:06
456 0:02:05 0:02:06
123 0:00:14 0:00:14
756 0:03:47 0:05:44
756 0:01:56 0:05:44
756 0:00:01 0:05:44
I know I can use dplyr
to aggregate but when I run:我知道我可以使用dplyr
进行聚合,但是当我运行时:
df$Total_Time <- df %>% group_by(ID) %>% summarise(Freq = sum(Time))
I get an error, probably because Time is a character?我得到一个错误,可能是因为时间是一个字符?
I can think of using lubridate::hms
to convert those strings to numbers, but I haven't found the right way to format(.., format="%H:%M:%S")
back again, so here are two functions I have used for various related purposes:我可以考虑使用lubridate::hms
将这些字符串转换为数字,但是我还没有找到正确的方法来format(.., format="%H:%M:%S")
,所以这里是我用于各种相关目的的两个功能:
## simply convert "01:23:45" to 5025 (seconds) and "00:17:14.842" to 1034.842
time2num <- function(x) {
vapply(strsplit(x, ':'), function(y) sum(as.numeric(y) * c(60*60, 60, 1)),
numeric(1), USE.NAMES=FALSE)
}
## and back again
num2time <- function(x, digits.secs = getOption("digits.secs", 3)) {
hr <- as.integer(x %/% 3600)
min <- as.integer((x - 3600*hr) %/% 60)
sec <- (x - 3600*hr - 60*min)
if (anyNA(digits.secs)) {
# a mostly-arbitrary determination of significant digits,
# motivated by @Roland https://stackoverflow.com/a/27767973
for (digits.secs in 1:6) {
if (any(abs(signif(sec, digits.secs) - sec) > (10^(-3 - digits.secs)))) next
digits.secs <- digits.secs - 1L
break
}
}
sec <- sprintf(paste0("%02.", digits.secs[[1]], "f"), sec)
sec <- paste0(ifelse(grepl("^[0-9]\\.", sec), "0", ""), sec)
out <- sprintf("%02i:%02i:%s", hr, min, sec)
out[is.na(x)] <- NA_character_
out
}
With these,用这些,
library(dplyr)
df %>%
group_by(ID) %>%
mutate(Freq = num2time(sum(time2num(Time)), digits = 0)) %>%
ungroup()
# # A tibble: 6 x 3
# ID Time Freq
# <int> <chr> <chr>
# 1 456 0:00:01 00:02:06
# 2 456 0:02:05 00:02:06
# 3 123 0:00:14 00:00:14
# 4 756 0:03:47 00:05:44
# 5 756 0:01:56 00:05:44
# 6 756 0:00:01 00:05:44
Data数据
dat <- structure(list(ID = c(456L, 456L, 123L, 756L, 756L, 756L), Time = c("0:00:01", "0:02:05", "0:00:14", "0:03:47", "0:01:56", "0:00:01")), class = "data.frame", row.names = c(NA, -6L))
What @r2evans put together is great. @r2evans 的组合很棒。 However, since I already put it together, here is another option that works:但是,由于我已经把它放在一起,这里有另一个可行的选择:
library(hms)
library(tidyverse)
# create arbitrary data with duplicate ids and different lengths of time
df1 <- data.frame(ID = rep(c(111, 222, 444, 666, 777, 888), each = 2),
Time = paste0("01:",10:21,":09") %>% as_hms())
# ID Time
# 1 111 01:10:09
# 2 111 01:11:09
# 3 222 01:12:09
# 4 222 01:13:09
# 5 444 01:14:09
# 6 444 01:15:09
# 7 666 01:16:09
# 8 666 01:17:09
# 9 777 01:18:09
# 10 777 01:19:09
# 11 888 01:20:09
# 12 888 01:21:09
df1 %>%
group_by(ID) %>%
summarise(Total_Time = sum(Time) %>% as_hms())
# # A tibble: 6 × 2
# ID Total_Time
# <dbl> <time>
# 1 111 02:21:18
# 2 222 02:25:18
# 3 444 02:29:18
# 4 666 02:33:18
# 5 777 02:37:18
# 6 888 02:41:18
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.