![](/img/trans.png)
[英]How to label/count consecutive pairs of non-NA values in a data.table column?
[英]R data.table: How to “label” consecutive values in a column?
我有以下data.table(如果你把它用作data.frame就可以了)
library(data.table)
dt <- data.table(first_column = c("item1", "item2", "item3", "item4", "item5", "item6", "item7"),
second_column = c("cat1", "cat1", "cat1", "cat2", "cat2", "cat2", "cat2"), third_column = c(50, 10, 18, 3092, 731, 189, 1991))
> dt
first_column second_column third_column
1: item1 cat1 50
2: item2 cat1 10
3: item3 cat1 18
4: item4 cat2 3092
5: item5 cat2 731
6: item6 cat2 189
7: item7 cat2 1991
我想要:
(1)创建一个列,如果值<= 1000则为1
(2)然后将这些唯一分组编号为1
结果data.table如下所示:
> dt
first_column second_column third_column labels
0 item1 cat1 50 1
1 item2 cat1 10 1
2 item3 cat1 18 1
3 item4 cat2 3092 0
4 item5 cat2 731 2
5 item6 cat2 189 2
6 item7 cat2 1991 0
这将创建一个全零和一列的列:
dt$new[which(dt$third_column < 1000)] = 1
那么我如何标记这些1s的“分组”?
我们按'second_column分组,在'i'中指定逻辑条件( third_column <= 1000
),将(标签)分配( :=
)为.GRP
,然后在下一步中将NA值替换为0
dt[third_column<=1000, labels := .GRP , second_column][is.na(labels), labels :=0][]
# first_column second_column third_column labels
#1: item1 cat1 50 1
#2: item2 cat1 10 1
#3: item3 cat1 18 1
#4: item4 cat2 3092 0
#5: item5 cat2 731 2
#6: item6 cat2 189 2
#7: item7 cat2 1991 0
或者通过获取逻辑向量的累积和( !duplicated(second_column)
)并将其与另一个逻辑向量( third_column <= 1000
)相乘,第二个选项更紧凑
dt[, labels := cumsum(!duplicated(second_column))*(third_column <= 1000)]
dt
# first_column second_column third_column labels
#1: item1 cat1 50 1
#2: item2 cat1 10 1
#3: item3 cat1 18 1
#4: item4 cat2 3092 0
#5: item5 cat2 731 2
#6: item6 cat2 189 2
#7: item7 cat2 1991 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.