R 在 data.table 中查找区间

Question

i want to add a new column with intervals or breakpoints by group.我想按组添加一个带有间隔或断点的新列。 As an an example:举个例子：

This is my data.table:这是我的 data.table：

x <- data.table(a = c(1:8,1:8), b = c(rep("A",8),rep("B",8)))

I have already the breakpoint or rowindices:我已经有断点或行索引：

pos <- data.table(b =  c("A","A","B","B"), bp = c(3,5,2,4))

Here i can find the interval for group "A" with:在这里我可以找到组“A”的间隔：

findInterval(1:nrow(x[b=="A"]), pos[b=="A"]$bp)

How can i do this for each group.我怎样才能为每个组做到这一点。 In this case "A" and "B"?在这种情况下“A”和“B”？

Answer 1

An option is to split the datasets by 'b' column, use Map to loop over the corresponding list s, and apply findInterval一个选项是按“b”列split数据集，使用Map循环遍历相应的list ，并应用findInterval

Map(function(u, v) findInterval(seq_len(nrow(u)), v$bp), 
      split(x, x$b), split(pos, pos$b))
#$A
#[1] 0 0 1 1 2 2 2 2

#$B
#[1] 0 1 1 2 2 2 2 2

or another option is to group by 'b' from 'x', then use findInterval by subsetting the 'bp' from 'pos' by filtering with a logical condition created based on .BY或者另一种选择是从“x”中按“b”分组，然后通过使用基于.BY创建的逻辑条件进行过滤，通过将“bp”从“pos”子集来使用findInterval

x[, findInterval(seq_len(.N), pos$bp[pos$b==.BY]), b]
#    b V1
# 1: A  0
# 2: A  0
# 3: A  1
# 4: A  1
# 5: A  2
# 6: A  2
# 7: A  2
# 8: A  2
# 9: B  0
#10: B  1
#11: B  1
#12: B  2
#13: B  2
#14: B  2
#15: B  2
#16: B  2

Answer 2

Another option using rolling join in data.table :在data.table中使用滚动连接的另一个选项：

pos[, ri := rowid(b)]
x[, intvl := fcoalesce(pos[x, on=.(b, bp=a), roll=Inf, ri], 0L)]

output: output：

    a b intvl
 1: 1 A     0
 2: 2 A     0
 3: 3 A     1
 4: 4 A     1
 5: 5 A     2
 6: 6 A     2
 7: 7 A     2
 8: 8 A     2
 9: 1 B     0
10: 2 B     1
11: 3 B     1
12: 4 B     2
13: 5 B     2
14: 6 B     2
15: 7 B     2
16: 8 B     2

Answer 3

We can nest the pos data into list by b and join with x and use findInterval to get corresponding groups.我们可以通过b将pos数据nest到列表中，并与x连接，并使用findInterval来获取相应的组。

library(dplyr)

pos %>% 
   tidyr::nest(data = bp) %>%
   right_join(x, by = 'b') %>%
   group_by(b) %>%
   mutate(interval = findInterval(a, data[[1]][[1]])) %>%
   select(-data)

#    b        a interval
#   <chr> <int>    <int>
# 1 A         1        0
# 2 A         2        0
# 3 A         3        1
# 4 A         4        1
# 5 A         5        2
# 6 A         6        2
# 7 A         7        2
# 8 A         8        2
# 9 B         1        0
#10 B         2        1
#11 B         3        1
#12 B         4        2
#13 B         5        2
#14 B         6        2
#15 B         7        2
#16 B         8        2

R 在 data.table 中查找区间

问题描述

3 个解决方案

解决方案1
3 已采纳 2020-05-31 20:41:55

解决方案2
0 2020-05-31 22:57:03

解决方案3
0 2020-06-01 01:46:10

R 在 data.table 中查找区间

问题描述

3 个解决方案

解决方案1 3 已采纳 2020-05-31 20:41:55

解决方案2 0 2020-05-31 22:57:03

解决方案3 0 2020-06-01 01:46:10

解决方案1
3 已采纳 2020-05-31 20:41:55

解决方案2
0 2020-05-31 22:57:03

解决方案3
0 2020-06-01 01:46:10