简体   繁体   English

根据R列data.table中的行值按条件生成分组

[英]Generate group by condition on row value in column R data.table

I want to split a data.table in R into groups based on a condition in the value of a row. 我想根据行值中的条件将R中的data.table分成几组。 I have searched SO extensively and can't find an efficient data.table way to do this (I'm not looking to for loop across rows) 我已经进行了广泛的搜索,因此找不到有效的data.table方法来执行此操作(我不是要在行之间进行循环)

I have data like this: 我有这样的数据:

library(data.table)
dt1 <- data.table( x=1:139, t=c(rep(c(1:5),10),120928,rep(c(6:10),9), 10400,rep(c(13:19),6)))

I'd like to group at the large numbers (over a settable value) and come up with the example below: 我想将大量数字分组(超过可设置的值),并提出以下示例:

dt.desired <- data.table( x=1:139, t=c(rep(c(1:5),10),120928,rep(c(6:10),9), 10400,rep(c(13:19),6)), group=c(rep(1,50),rep(2,46),rep(3,43)))
dt1[ , group := cumsum(t > 200) + 1]

dt1[t > 200]
#     x      t group
# 1: 51 120928     2
# 2: 97  10400     3
dt.desired[t > 200]
#     x      t group
# 1: 51 120928     2
# 2: 97  10400     3

You can use a test like t>100 to find the large values. 您可以使用t>100类的测试来找到较大的值。 You can then use cumsum() to get a running integer for each set of rows up to (but not including) the large number. 然后,您可以使用cumsum()为直到(但不包括)大数字的每组行获取一个运行整数。

# assuming you can define "large" as >100
dt1[ , islarge := t>100]
dt1[ , group := shift(cumsum(islarge))]

I understand that you want the large number to be part of the group above it. 我了解您希望大量人员成为其上方群组的一部分。 To do this, use shift() and then fill in the first value (which will be NA after shift() is run. 为此,请使用shift() ,然后填写第一个值(运行shift()后将为NA shift()

# a little cleanup 
# (fix first value and start group at 1 instead of 0)
dt1[1, group := 0]
dt1[ , group := group+1]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM