[英]Counter based on ID and value in a column
我有一个包含 ID 和类型列的 dataframe。 我想要一个计数器,如果类型为“T”,那么下一行中的计数器将为每个 ID 的计数器 + 1。 基本上,计数器是本示例中的 Output_column。
ID <- c(1,1,1,1,1,1,3,3,4,4,4,4)
Type <- c("A","A","T","A","A","A","A","A","T","A","T","A")
Output_Column <- c(1,1,1,2,2,2,1,1,1,2,2,3)
ID Type Output_Column
1 1 A 1
2 1 A 1
3 1 T 1
4 1 A 2
5 1 A 2
6 1 A 2
7 3 A 1
8 3 A 1
9 4 T 1
10 4 A 2
11 4 T 2
12 4 A 3
d <- data.frame(ID,Type, Output_Column)
baseR 解决方案
output_col <- as.numeric(ave(Type, ID, FUN = function(x) cumsum(c('T', x[-length(x)]) == 'T')))
output_col
[1] 1 1 1 2 2 2 1 1 1 2 2 3
这是data.table
版本:
library(data.table)
setDT(d)[, res := shift(cumsum(Type == 'T') + 1, fill = 1), ID]
d
# ID Type Output_Column res
# 1: 1 A 1 1
# 2: 1 A 1 1
# 3: 1 T 1 1
# 4: 1 A 2 2
# 5: 1 A 2 2
# 6: 1 A 2 2
# 7: 3 A 1 1
# 8: 3 A 1 1
# 9: 4 T 1 1
#10: 4 A 2 2
#11: 4 T 2 2
#12: 4 A 3 3
这是一种使用group_by
、 lag
和cumsum
实现它的方法
library(dplyr)
d %>%
# group by ID so calculation is within each ID
group_by(ID) %>%
mutate(
# create a counter variable check if previous Type is "T"
# Here default is "T" which result the first row of ID will start at 1
counter = if_else(lag(Type, default = "T") == "T", 1, 0),
# cumsum the counter which result same as the expected output column
output_column_calculated = cumsum(counter)) %>%
ungroup() %>%
# Remove the counter column if not needed
select(-counter)
#> # A tibble: 12 x 4
#> ID Type Output_Column output_column_calculated
#> <dbl> <chr> <dbl> <dbl>
#> 1 1 A 1 1
#> 2 1 A 1 1
#> 3 1 T 1 1
#> 4 1 A 2 2
#> 5 1 A 2 2
#> 6 1 A 2 2
#> 7 3 A 1 1
#> 8 3 A 1 1
#> 9 4 T 1 1
#> 10 4 A 2 2
#> 11 4 T 2 2
#> 12 4 A 3 3
由代表 package (v2.0.0) 于 2021 年 4 月 26 日创建
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.