在R中找到向量中的元素

Question

我有一個矩陣正好有2行和n列的例子

c(0,0,0,0,1,0,2,0,1,0,1,1,1,0,2)->a1
c(0,2,0,0,0,0,2,1,1,0,0,0,0,2,0)->a2
rbind(a1,a2)->matr

對於一個特定的列（在這個例子中，9在兩行中都是1）我需要在左邊和右邊找到第一個2/0或0/2的實例 - 在這個例子中，左邊是2，另一個是是14）

每一行的元素可以是0,1,2 - 沒有別的。 有沒有辦法快速對大矩陣（2行）進行該操作？ 我需要它600k倍，所以速度可能是一個考慮因素

Answer 1

library(compiler)
myfun <- cmpfun(function(m, cl) {
  li <- ri <- cl
  nc <- ncol(m)
  repeat {
    li <- li - 1
    if(li == 0 || ((m[1, li] != 1) && (m[1, li] + m[2, li] == 2))) {
      l <- li
      break
    }
  }
  repeat {
    ri <- ri + 1
    if(ri == nc || ((m[1, ri] != 1) && (m[1, ri] + m[2, ri] == 2))) {
      r <- ri
      break
    }
  }
  c(l, r)
})

並且在考慮了@Martin Morgan的觀察后，

set.seed(1)
N <- 1000000
test <- rbind(sample(0:2, N, replace = TRUE),
              sample(0:2, N, replace = TRUE))

library(microbenchmark)
microbenchmark(myfun(test, N / 2), fun(test, N / 2), foo(test, N / 2),
               AWebb(test, N / 2), RHertel(test, N / 2))
# Unit: microseconds
               expr         min          lq         mean      median          uq         max neval  cld
#    myfun(test, N/2)       4.658      20.033 2.237153e+01      22.536      26.022      85.567   100 a   
#      fun(test, N/2)   36685.750   47842.185 9.762663e+04   65571.546  120321.921  365958.316   100  b  
#      foo(test, N/2) 2622845.039 3009735.216 3.244457e+06 3185893.218 3369894.754 5170015.109   100    d
#    AWebb(test, N/2)  121504.084  142926.590 1.990204e+05  193864.670  209918.770  489765.471   100   c 
#  RHertel(test, N/2)   65998.733   76805.465 1.187384e+05   86089.980  144793.416  385880.056   100  b  

set.seed(123)
test <- rbind(sample(0:2, N, replace = TRUE, prob = c(5, 90, 5)),
              sample(0:2, N, replace = TRUE, prob = c(5, 90, 5)))
microbenchmark(myfun(test, N / 2), fun(test, N / 2), foo(test, N / 2),
               AWebb(test, N / 2), RHertel(test, N / 2))
# Unit: microseconds
#                expr         min          lq         mean      median         uq         max neval  cld
#    myfun(test, N/2)      81.805     103.732     121.9619     106.459     122.36     307.736   100 a   
#      fun(test, N/2)   26362.845   34553.968   83582.9801   42325.755  106303.84  403212.369   100  b  
#      foo(test, N/2) 2598806.742 2952221.561 3244907.3385 3188498.072 3505774.31 4382981.304   100    d
#    AWebb(test, N/2)  109446.866  125243.095  199204.1013  176207.024  242577.02  653299.857   100   c 
#  RHertel(test, N/2)   56045.309   67566.762  125066.9207   79042.886  143996.71  632227.710   100  b

Answer 2

通過平方行並添加它們來組合信息。 正確的結果應該是4 。 然后，只需找到第一列小於9（ rev(which())[1] ）和第一列大於9（ which()[1] ）。

fun <- function(matr, col){
    valid <- which((matr[1,]^2 + matr[2,]^2) == 4)
    if (length(valid) == 0) return(c(NA,NA))

    left <- valid[rev(which(valid < col))[1]]
    right <- valid[which(valid > col)[1]]

    c(left,right)

    }

fun(matr,9)
# [1]  2 14

fun(matr,1)
# [1] NA  2

fun(matrix(0,nrow=2,ncol=100),9)
# [1] NA NA

基准

set.seed(1)
test <- rbind(sample(0:2,1000000,replace=T),
              sample(0:2,1000000,replace=T))

microbenchmark::microbenchmark(fun(test,9))
# Unit: milliseconds
#         expr     min       lq     mean   median       uq      max neval
# fun(test, 9) 22.7297 27.21038 30.91314 27.55106 28.08437 51.92393   100

編輯：感謝@MatthewLundberg指出了很多錯誤。

Answer 3

我比@Laterow慢，但無論如何，這是一種類似的方法

foo  <- function(mtr, targetcol) {
  matr1  <-  colSums(mtr)
  matr2  <- apply(mtr, 2, function(x) x[1]*x[2])
  cols  <-  which(matr1 == 2 & matr2 == 0) - targetcol
  left  <-   cols[cols < 0]
  right  <-  cols[cols > 0]
  c(ifelse(length(left) == 0, NA, targetcol + max(left)),
    ifelse(length(right) == 0, NA, targetcol + min(right)))
}

foo(matr,9) #2 14

Answer 4

這是一個有趣的問題。 這是我將如何解決它。

首先定義一個向量，其中包含每列的乘積：

a3 <- matr[1,]*matr[2,]

然后我們可以很容易地找到具有（0/2）或（2/0）對的列，因為我們知道矩陣只能包含值0,1和2：

the02s <- which(colSums(matr)==2 & a3==0)

接下來，我們希望在該列的左側和右側找到最接近給定列號的（0/2）或（2/0）對。 列號可以是9，例如：

thecol <- 9

現在我們基本上只需要找到最接近列thecol的（0/2）或（2/0）組合的索引（矩陣中的列號）。 我們只需要使用findInterval()的輸出：

pos <- findInterval(thecol,the02s)
pos <- c(pos, pos+1)
pos[pos==0] <- NA # output NA if no column was found on the left

結果是：

the02s[pos]
#  2 14

因此，在這種情況下，滿足所需條件的thecol兩側最近列的索引將為2和14，並且我們可以確認這些列號都包含一個相關組合：

matr[,14]
#a1 a2 
# 0  2
matr[,2]
#a1 a2 
# 0  2

編輯：我更改了答案， thecol在矩陣中左邊和/或右邊沒有列滿足所需條件的情況下返回NA 。

Answer 5

如果您多次這樣做，請預先計算所有位置

loc <- which((a1==2 & a2==0) | (a1==0 & a2==2))

然后，您可以使用findInterval找到左邊和右邊的第一個

i<-findInterval(9,loc);loc[c(i,i+1)]
# [1]  2 14

請注意，如果您需要指定多個目標列，則會對findInterval進行矢量化。

在R中找到向量中的元素

問題描述

5 個解決方案

解決方案1
2 2016-02-28 19:39:15

解決方案2
0 2016-02-28 17:31:00

解決方案3
0 2016-02-28 17:39:47

解決方案4
0 2016-02-28 19:16:28

解決方案5
0 2016-02-28 20:32:05

在R中找到向量中的元素

問題描述

5 個解決方案

解決方案1 2 2016-02-28 19:39:15

解決方案2 0 2016-02-28 17:31:00

解決方案3 0 2016-02-28 17:39:47

解決方案4 0 2016-02-28 19:16:28

解決方案5 0 2016-02-28 20:32:05

解決方案1
2 2016-02-28 19:39:15

解決方案2
0 2016-02-28 17:31:00

解決方案3
0 2016-02-28 17:39:47

解決方案4
0 2016-02-28 19:16:28

解決方案5
0 2016-02-28 20:32:05