[英]R data.table: How to “label” consecutive values in a column?
I have the following data.table (though it's ok if you use it as a data.frame) 我有以下data.table(如果你把它用作data.frame就可以了)
library(data.table)
dt <- data.table(first_column = c("item1", "item2", "item3", "item4", "item5", "item6", "item7"),
second_column = c("cat1", "cat1", "cat1", "cat2", "cat2", "cat2", "cat2"), third_column = c(50, 10, 18, 3092, 731, 189, 1991))
> dt
first_column second_column third_column
1: item1 cat1 50
2: item2 cat1 10
3: item3 cat1 18
4: item4 cat2 3092
5: item5 cat2 731
6: item6 cat2 189
7: item7 cat2 1991
I would like to: 我想要:
(1) create a column which is 1 if the value is <= 1000 (1)创建一个列,如果值<= 1000则为1
(2) then number these unique groupings of 1's (2)然后将这些唯一分组编号为1
The resulting data.table would look like this: 结果data.table如下所示:
> dt
first_column second_column third_column labels
0 item1 cat1 50 1
1 item2 cat1 10 1
2 item3 cat1 18 1
3 item4 cat2 3092 0
4 item5 cat2 731 2
5 item6 cat2 189 2
6 item7 cat2 1991 0
This would create a column of all zeros and ones: 这将创建一个全零和一列的列:
dt$new[which(dt$third_column < 1000)] = 1
How would I then label these "groupings" of 1s? 那么我如何标记这些1s的“分组”?
We group by 'second_column, specify the logical condition ( third_column <= 1000
) in 'i', assign ( :=
) the 'labels' as .GRP
, then replace the NA values to 0 in the next step 我们按'second_column分组,在'i'中指定逻辑条件(
third_column <= 1000
),将(标签)分配( :=
)为.GRP
,然后在下一步中将NA值替换为0
dt[third_column<=1000, labels := .GRP , second_column][is.na(labels), labels :=0][]
# first_column second_column third_column labels
#1: item1 cat1 50 1
#2: item2 cat1 10 1
#3: item3 cat1 18 1
#4: item4 cat2 3092 0
#5: item5 cat2 731 2
#6: item6 cat2 189 2
#7: item7 cat2 1991 0
Or a second option is more compact by getting the cumulative sum of logical vector ( !duplicated(second_column)
) and multiply it with another logical vector ( third_column <= 1000
) 或者通过获取逻辑向量的累积和(
!duplicated(second_column)
)并将其与另一个逻辑向量( third_column <= 1000
)相乘,第二个选项更紧凑
dt[, labels := cumsum(!duplicated(second_column))*(third_column <= 1000)]
dt
# first_column second_column third_column labels
#1: item1 cat1 50 1
#2: item2 cat1 10 1
#3: item3 cat1 18 1
#4: item4 cat2 3092 0
#5: item5 cat2 731 2
#6: item6 cat2 189 2
#7: item7 cat2 1991 0
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.