[英]R How to count occurrence of factor levels
我有以下格式的數據:
ID Task1 Task2 Task3 Task4
abc Hard Hard Mix Hard
xyz Easy Mix Easy Hard
als Mix Hard Easy Hard
bld Hard Mix Easy Easy
cqr Hard Easy Hard Hard
alx Hard Hard Hard Hard
對於每個ID ,我有興趣分別計算每個因素水平的出現 - 在這種情況下是 Hard、Mix 和 Easy(見下文)。 我想計算每個因素的每個 ID 的總出現次數,然后我還想計算該因素的該 ID 的最大連續出現次數,例如,
ID Task1 Task2 Task3 Task4 Hard_Total Max_Consecutive_Hard
abc Hard Hard Mix Hard 3 2
xyz Easy Mix Easy Hard 1 1
als Mix Hard Easy Hard 2 1
bld Hard Mix Easy Easy 1 1
cqr Hard Easy Hard Hard 3 2
alx Hard Hard Hard Hard 4 4
有人可以提出解決方案嗎?
示例數據的 dput() 如下。
structure(list(ID = structure(c(1L, 6L, 2L, 4L, 5L, 3L), .Label = c("abc","als", "alx", "bld", "cqr", "xyz"), class = "factor"), Task1 = structure(c(2L, 1L, 3L, 2L, 2L, 2L), .Label = c("Easy", "Hard", "Mix"), class = "factor"), Task2 = structure(c(2L, 3L, 2L, 3L, 1L, 2L), .Label = c("Easy", "Hard", "Mix"), class = "factor"), Task3 = structure(c(3L, 1L, 1L, 1L, 2L, 2L), .Label = c("Easy", "Hard", "Mix"), class = "factor"), Task4 = structure(c(2L, 2L, 2L, 1L, 2L, 2L), .Label = c("Easy", "Hard"), class = "factor")), class = "data.frame", row.names = c(NA, -6L))
您可以使用rowSums()
按行獲取Hard
值的總數,然后按行使用 rle rle()
來獲得最長的運行時間:
transform(df, Hard_Total = rowSums(df[paste0("Task", 1:4)] == "Hard", na.rm = TRUE),
Max_Consecutive_Hard = apply(df[paste0("Task", 1:4)], 1, function(x) with(rle(x), max(lengths[values == "Hard"], na.rm = TRUE))))
ID Task1 Task2 Task3 Task4 Hard_Total Max_Consecutive_Hard
1 abc Hard Hard Mix Hard 3 2
2 xyz Easy Mix Easy Hard 1 1
3 als Mix Hard Easy Hard 2 1
4 bld Hard Mix Easy Easy 1 1
5 cqr Hard Easy Hard Hard 3 2
6 alx Hard Hard Hard Hard 4 4
首先,我們創建函數來獲取您需要fun_hard
和fun_max
的兩列。 fun_hard()
計算輸入中“硬”出現的次數,而fun_max()
rle()
輸入中最大連續“硬”出現。
fun_hard = function(x) {
sum(x=="Hard")
}
fun_max = function(x) {
rle_hard <- rle(x)
max(rle_hard$lengths[rle_hard$values == "Hard"])
}
我們使用apply()
在給定df
的每一行上使用fun_hard()
和fun_max()
。
test_df$Hard_Total = apply(test_df[,c(2,3,4,5)], MARGIN = 1, FUN = fun_hard)
test_df$Max_Consecutive_Hard =
apply(test_df[,c(2,3,4,5)], MARGIN = 1, FUN = fun_max)
Output:
ID Task1 Task2 Task3 Task4 Hard_Total Max_Consecutive_Hard
1 abc Hard Hard Mix Hard 3 2
2 xyz Easy Mix Easy Hard 1 1
3 als Mix Hard Easy Hard 2 1
4 bld Hard Mix Easy Easy 1 1
5 cqr Hard Easy Hard Hard 3 2
6 alx Hard Hard Hard Hard 4 4
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.