简体   繁体   English

R 在 data.table 中查找区间

[英]R find intervals in data.table

i want to add a new column with intervals or breakpoints by group.我想按组添加一个带有间隔或断点的新列。 As an an example:举个例子:

This is my data.table:这是我的 data.table:

x <- data.table(a = c(1:8,1:8), b = c(rep("A",8),rep("B",8)))

I have already the breakpoint or rowindices:我已经有断点或行索引:

pos <- data.table(b =  c("A","A","B","B"), bp = c(3,5,2,4))

Here i can find the interval for group "A" with:在这里我可以找到组“A”的间隔:

findInterval(1:nrow(x[b=="A"]), pos[b=="A"]$bp)

How can i do this for each group.我怎样才能为每个组做到这一点。 In this case "A" and "B"?在这种情况下“A”和“B”?

An option is to split the datasets by 'b' column, use Map to loop over the corresponding list s, and apply findInterval一个选项是按“b”列split数据集,使用Map循环遍历相应的list ,并应用findInterval

Map(function(u, v) findInterval(seq_len(nrow(u)), v$bp), 
      split(x, x$b), split(pos, pos$b))
#$A
#[1] 0 0 1 1 2 2 2 2

#$B
#[1] 0 1 1 2 2 2 2 2

or another option is to group by 'b' from 'x', then use findInterval by subsetting the 'bp' from 'pos' by filtering with a logical condition created based on .BY或者另一种选择是从“x”中按“b”分组,然后通过使用基于.BY创建的逻辑条件进行过滤,通过将“bp”从“pos”子集来使用findInterval

x[, findInterval(seq_len(.N), pos$bp[pos$b==.BY]), b]
#    b V1
# 1: A  0
# 2: A  0
# 3: A  1
# 4: A  1
# 5: A  2
# 6: A  2
# 7: A  2
# 8: A  2
# 9: B  0
#10: B  1
#11: B  1
#12: B  2
#13: B  2
#14: B  2
#15: B  2
#16: B  2

Another option using rolling join in data.table :data.table中使用滚动连接的另一个选项:

pos[, ri := rowid(b)]
x[, intvl := fcoalesce(pos[x, on=.(b, bp=a), roll=Inf, ri], 0L)]

output: output:

    a b intvl
 1: 1 A     0
 2: 2 A     0
 3: 3 A     1
 4: 4 A     1
 5: 5 A     2
 6: 6 A     2
 7: 7 A     2
 8: 8 A     2
 9: 1 B     0
10: 2 B     1
11: 3 B     1
12: 4 B     2
13: 5 B     2
14: 6 B     2
15: 7 B     2
16: 8 B     2

We can nest the pos data into list by b and join with x and use findInterval to get corresponding groups.我们可以通过bpos数据nest到列表中,并与x连接,并使用findInterval来获取相应的组。

library(dplyr)

pos %>% 
   tidyr::nest(data = bp) %>%
   right_join(x, by = 'b') %>%
   group_by(b) %>%
   mutate(interval = findInterval(a, data[[1]][[1]])) %>%
   select(-data)

#    b        a interval
#   <chr> <int>    <int>
# 1 A         1        0
# 2 A         2        0
# 3 A         3        1
# 4 A         4        1
# 5 A         5        2
# 6 A         6        2
# 7 A         7        2
# 8 A         8        2
# 9 B         1        0
#10 B         2        1
#11 B         3        1
#12 B         4        2
#13 B         5        2
#14 B         6        2
#15 B         7        2
#16 B         8        2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM