[英]Min and max for each row with dataframe in R
我需要計算幾個參加在線測試的人的處理時間。 因此,對於每個人來說,都有許多時間戳(每個任務一個時間戳)。 根據最小和最大日期值之間的時間差計算處理的持續時間。 下面的示例有效 (student_1),但它僅在沒有缺失值的情況下有效(student_2 和 student_3)。 有什么想法嗎?
library(anytime)
number <- c(1, 2, 3)
uniquename <- c("student_1", "student_2", "student_3")
timestamp_1 <- c(anytime("2020-02-25T12:42:56.476Z"),NA,anytime("2020-02-25T10:05:22.388Z"))
timestamp_2 <- c(anytime("2020-02-25T12:51:22.388Z"),anytime("2020-02-25T12:51:22.388Z"),NA)
timestamp_3 <- c(anytime("2020-02-25T13:00:45.042Z"),anytime("2020-02-25T13:00:45.042Z"),NA)
timestamp_4 <- c(anytime("2020-02-25T13:31:48.073Z"),anytime("2020-02-25T13:31:48.073Z"),NA)
timestamp_5 <- c(anytime("2020-02-25T14:22:57.103Z"),anytime("2020-02-25T15:00:00Z"),anytime("2020-02-25T14:05:00Z"))
df3 <- data.frame(number,
uniquename,
timestamp_1,
timestamp_2,
timestamp_3,
timestamp_4,
timestamp_5)
df3$date_min <- apply(df3[3:7], 1, FUN=min)
df3$date_max <- apply(df3[3:7], 1, FUN=max)
df3$date_min <- anytime(df3$date_min)
df3$date_max <- anytime(df3$date_max)
df3$diff <- difftime(df3$date_min, df3$date_max, units = "mins")
df3$diff <- round(df3$diff,0)
df3$diff <- as.numeric(df3$diff)*(-1)
View(df3)
這是一個data.table
方法
library( data.table )
setDT(df3)
#get columns with timestamps
cols = grep( "^timestamp_", names(df3), value = TRUE )
#check if at least two timestampa are given, then calculate min and max
df3[ df3[, rowSums( !is.na(.SD) ), .SDcols = cols ] >= 2,
date_min := do.call( pmin, c( .SD, list( na.rm = TRUE ) ) ), .SDcols = cols ]
df3[ df3[, rowSums( !is.na(.SD) ), .SDcols = cols ] >= 2,
date_max := do.call( pmax, c( .SD, list( na.rm = TRUE ) ) ), .SDcols = cols ]
# number uniquename timestamp_1 timestamp_2
# 1: 1 student_1 2020-02-25 12:42:56 2020-02-25 12:51:22
# 2: 2 student_2 <NA> 2020-02-25 12:51:22
# 3: 3 student_3 2020-02-25 10:05:22 <NA>
# timestamp_3 timestamp_4 timestamp_5
# 1: 2020-02-25 13:00:45 2020-02-25 13:31:48 2020-02-25 14:22:57
# 2: 2020-02-25 13:00:45 2020-02-25 13:31:48 2020-02-25 15:00:00
# 3: <NA> <NA> 2020-02-25 14:05:00
# date_min date_max
# 1: 2020-02-25 12:42:56 2020-02-25 14:22:57
# 2: 2020-02-25 12:51:22 2020-02-25 15:00:00
# 3: 2020-02-25 10:05:22 2020-02-25 14:05:00
據我所知,您可以使用當前方法為min()
和max()
添加na.rm
參數:
df3$date_min <- apply(df3[3:7], 1, min, na.rm = TRUE)
df3$date_max <- apply(df3[3:7], 1, max, na.rm = TRUE)
df3[c("number", "uniquename", "date_min", "date_max")]
number uniquename date_min date_max
1 1 student_1 2020-02-25 12:42:56 2020-02-25 14:22:57
2 2 student_2 2020-02-25 12:51:22 2020-02-25 15:00:00
3 3 student_3 2020-02-25 10:05:22 2020-02-25 14:05:00
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.