Dplyr按行访问整列

Question

给定以下数据

对于每一行，我想查找A首次超过B的索引。 因此，所需的答案是：

  A B NextIndex
1 1 2         3
2 2 2         3
3 3 3         4
4 4 4         5
5 5 5         5

我使用dplyr方法是

A_col<-foo$A  
foo  %>%  rowwise() %>% mutate(NextIndex=which(A_col-B>0)[1] )

我的实际data.frame是几百万行，并且处理时间急剧增加。 注意，我在每次行比较中都引用了完整的A_col ，并且尝试使用row_number()进行版本转换，但是速度并未得到明显提高。 另外，请注意，A和B实际上是我的data.frame中的POSIXct变量，并且将在时间上严格增加，但不会定期增加。

我将如何提高这种表达的效率？

Answer 1

我们可以使用vapply

foo$nextIndex <- vapply(foo$B, function(x) which(foo$A-x>0)[1], 1)
foo
#   A B nextIndex
#1 1 2         3
#2 2 2         3
#3 3 3         4
#4 4 4         5
#5 5 4         5

或其他选择（如果值顺序正确）

findInterval(foo$B, foo$A)+1L
#[1] 3 3 4 5 5

在dplyr链中使用它

foo %>% 
    mutate(rowIndex = findInterval(B, A)+1L)

Answer 2

这个怎么样：

df$nextIndex <- apply(df, 1, function(x) which.max(df$A - x[2] > 0))
df
  A B nextIndex
1 1 2         3
2 2 2         3
3 3 3         4
4 4 4         5
5 5 4         5

Dplyr按行访问整列

问题描述

2 个解决方案

解决方案1
2 已采纳 2016-02-01 03:03:38

解决方案2
0 2016-02-01 02:59:07

Dplyr按行访问整列

问题描述

2 个解决方案

解决方案1 2 已采纳 2016-02-01 03:03:38

解决方案2 0 2016-02-01 02:59:07

解决方案1
2 已采纳 2016-02-01 03:03:38

解决方案2
0 2016-02-01 02:59:07