簡體   English   中英

R 如何計算因子水平的出現

[英]R How to count occurrence of factor levels

我有以下格式的數據:

ID    Task1   Task2   Task3   Task4
abc   Hard    Hard    Mix     Hard              
xyz   Easy    Mix     Easy    Hard               
als   Mix     Hard    Easy    Hard               
bld   Hard    Mix     Easy    Easy               
cqr   Hard    Easy    Hard    Hard               
alx   Hard    Hard    Hard    Hard               

對於每個ID ,我有興趣分別計算每個因素水平的出現 - 在這種情況下是 Hard、Mix 和 Easy(見下文)。 我想計算每個因素的每個 ID 的總出現次數,然后我還想計算該因素的該 ID 的最大連續出現次數,例如,

ID    Task1   Task2   Task3   Task4   Hard_Total   Max_Consecutive_Hard
abc   Hard    Hard    Mix     Hard    3            2
xyz   Easy    Mix     Easy    Hard    1            1
als   Mix     Hard    Easy    Hard    2            1
bld   Hard    Mix     Easy    Easy    1            1
cqr   Hard    Easy    Hard    Hard    3            2
alx   Hard    Hard    Hard    Hard    4            4

有人可以提出解決方案嗎?

示例數據的 dput() 如下。

structure(list(ID = structure(c(1L, 6L, 2L, 4L, 5L, 3L), .Label = c("abc","als", "alx", "bld", "cqr", "xyz"), class = "factor"), Task1 = structure(c(2L, 1L, 3L, 2L, 2L, 2L), .Label = c("Easy", "Hard", "Mix"), class = "factor"), Task2 = structure(c(2L, 3L, 2L, 3L, 1L, 2L), .Label = c("Easy", "Hard", "Mix"), class = "factor"), Task3 = structure(c(3L, 1L, 1L, 1L, 2L, 2L), .Label = c("Easy", "Hard", "Mix"), class = "factor"), Task4 = structure(c(2L, 2L, 2L, 1L, 2L, 2L), .Label = c("Easy", "Hard"), class = "factor")), class = "data.frame", row.names = c(NA, -6L))

您可以使用rowSums()按行獲取Hard值的總數,然后按行使用 rle rle()來獲得最長的運行時間:

transform(df, Hard_Total = rowSums(df[paste0("Task", 1:4)] == "Hard", na.rm = TRUE),
              Max_Consecutive_Hard = apply(df[paste0("Task", 1:4)], 1, function(x) with(rle(x), max(lengths[values == "Hard"], na.rm = TRUE))))

   ID Task1 Task2 Task3 Task4 Hard_Total Max_Consecutive_Hard
1 abc  Hard  Hard   Mix  Hard          3                    2
2 xyz  Easy   Mix  Easy  Hard          1                    1
3 als   Mix  Hard  Easy  Hard          2                    1
4 bld  Hard   Mix  Easy  Easy          1                    1
5 cqr  Hard  Easy  Hard  Hard          3                    2
6 alx  Hard  Hard  Hard  Hard          4                    4

首先,我們創建函數來獲取您需要fun_hardfun_max的兩列。 fun_hard()計算輸入中“硬”出現的次數,而fun_max() rle()輸入中最大連續“硬”出現。

fun_hard = function(x) {
  sum(x=="Hard")
}

fun_max = function(x) {
  rle_hard <- rle(x)
  max(rle_hard$lengths[rle_hard$values == "Hard"])
}

我們使用apply()在給定df的每一行上使用fun_hard()fun_max()

test_df$Hard_Total = apply(test_df[,c(2,3,4,5)], MARGIN = 1, FUN = fun_hard)
test_df$Max_Consecutive_Hard = 
              apply(test_df[,c(2,3,4,5)], MARGIN = 1, FUN = fun_max)

Output:

  ID Task1 Task2 Task3 Task4 Hard_Total Max_Consecutive_Hard
1 abc  Hard  Hard   Mix  Hard          3                    2
2 xyz  Easy   Mix  Easy  Hard          1                    1
3 als   Mix  Hard  Easy  Hard          2                    1
4 bld  Hard   Mix  Easy  Easy          1                    1
5 cqr  Hard  Easy  Hard  Hard          3                    2
6 alx  Hard  Hard  Hard  Hard          4                    4

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM