简体   繁体   English

在R中的矩阵中找到整数向量的索引的最快方法

[英]Quickest way of finding the index of an integer vector in a matrix in R

I have the following problem in R (for a Markov chain). 我在R中有以下问题(对于马尔可夫链)。 Suppose there is a statespace matrix S with rows of unique integer vectors (states). 假设存在具有唯一整数向量(状态)行的状态空间矩阵S. I am given a vector s from this matrix, and want to determine the index of the row that corresponds to this vector. 我从这个矩阵给出了一个向量s,并想确定与该向量对应的行的索引。 There are a couple of solutions: 有几个解决方案:

  1. A solution using all.equal as in: 使用all.equal的解决方案如下:

     which(apply(S,1,function(x){ isTRUE(all.equal(s,x)) }) ) 
  2. Map the vectors to a unique string and identify them with this string: 将向量映射到唯一字符串并使用此字符串标识它们:

     statecodes <- apply(S,1,function(x) paste(x,collapse=" ") ) check.equal <- function(s) { z <- which(statecodes == paste(s, collapse=" ")) return(z) } check.equal(s) 

The first (often suggested) solution is downright terrible; 第一个(通常建议的)解决方案是彻头彻尾的糟糕; it already takes 2.16 seconds for a state space of 16,000 vectors with length 4. The second solution is a lot faster, taking 0-0.01 seconds for the same state space. 对于长度为4的16,000个向量的状态空间,它已经需要2.16秒。第二个解决方案要快得多,对于相同的状态空间需要0-0.01秒。 However, when the length of the vectors increases, it becomes increasingly slow. 然而,当向量的长度增加时,它变得越来越慢。 I feel that my string method is reasonable, but there must be something better. 我觉得我的字符串方法是合理的,但必须有更好的东西。 What would be a quicker way to make such comparisons? 什么是更快的方式进行这样的比较?

For completeness sake, the state space for my problem could be generated as follows. 为了完整起见,我的问题的状态空间可以生成如下。 If the vector has N elements, and I denotes the maximum that each element of a vector can attain (for example, 10) it is given by: 如果向量具有N个元素,并且I表示向量的每个元素可以达到的最大值(例如,10),则由下式给出:

I <- rep(10,N)
S <- as.matrix(expand.grid( lapply(1:N, function(i) { 0:I[i]}) ) )

How can the integrality of the states be exploited in order to make an as quick as possible comparison? 为了尽可能快地进行比较,如何利用各州的完整性?

一种方法是which(colSums(abs(t(S)-V))==0)其中V是您正在寻找的向量。

One simple way of getting an integer value for each state is to cast the value to an integer and then multiply each column by the right base. 获取每个状态的整数值的一种简单方法是将值转换为整数,然后将每列乘以右基。

My version of that is makecheck2 ; 我的版本是makecheck2 ; the version using paste is makecheck2 . 使用paste的版本是makecheck2 I've also modified the paste version to use match so it can check multiple values at the same time. 我还修改了paste版本以使用match因此它可以同时检查多个值。 Both versions now return a function to be used to get the match. 两个版本现在都返回一个用于获取匹配的函数。

The setup for my version is faster; 我的版本设置更快; 0.065 sec vs 1.552 sec. 0.065秒vs 1.552秒

N <- 5
I <- rep(10,N)
S <- as.matrix(expand.grid( lapply(1:N, function(i) { 0:I[i]}) ) )
system.time(f1 <- makecheck1(S))
#   user  system elapsed 
#  1.547   0.000   1.552 
system.time(f2 <- makecheck2(S))
#   user  system elapsed 
#  0.063   0.000   0.065 

Here I test with 1 to 10000 values to check. 在这里,我测试1到10000个值来检查。 The paste version is faster for small values; 对于小值, paste版本更快; my version is faster for large values. 对于大值,我的版本更快。

> set.seed(5)
> k <- lapply(0:4, function(idx) sample(1:nrow(S), 10^idx))
> s <- lapply(k, function(idx) S[idx,])
> t1 <- sapply(s, function(x) unname(system.time(for(i in 1:100) f1(x))[1]))
> t2 <- sapply(s, function(x) unname(system.time(for(i in 1:100) f2(x))[1]))
> data.frame(n=10^(0:4), time1=t1, time2=t2)
      n time1 time2
1     1 0.761 1.512
2    10 0.772 1.523
3   100 0.857 1.552
4  1000 1.592 1.547
5 10000 9.651 1.848

Code for both versions follow: 两个版本的代码如下:

makecheck2 <- function(m) {
  codes <- vector("list", length=ncol(m))
  top <- vector("integer", length=ncol(m)+1)
  top[1L] <- 1L
  for(idx in 1:ncol(m)) {
    codes[[idx]] <- unique(m[,idx])
    top[idx+1L] <- top[idx]*length(codes[[idx]])
  }
  getcode <- function(x) {
    out <- 0L
    for(idx in 1:length(codes)) {
      out <- out + top[idx]*match(x[,idx], codes[[idx]])
    }
    out
  }
  key <- getcode(m)
  f <- function(x) {
    if(!is.matrix(x)) {
      x <- matrix(x, ncol=length(codes))
    }
    match(getcode(x), key)
  }
  rm(m) # perhaps there's a better way to remove these from the closure???
  rm(idx)
  f
}

makecheck1 <- function(m) {
  n <- ncol(m)
  statecodes <- apply(m,1,function(x) paste(x,collapse=" ") )
  rm(m)
  function(x) {
    if(!is.matrix(x)) {
      x <- matrix(x, ncol=n)
    }
    x <- apply(x, 1, paste, collapse=" ")
    match(x, statecodes)
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM