简体   繁体   English

R 循环两个向量

[英]R Looping through two vectors

Good day,再会,

I need a function that creates increasing ID's for two parameters.我需要一个 function 来为两个参数创建递增的 ID。 I came up with this function which works fine, but I want it to be vectorized and I cannot seem to avoid a Big O factor of N².我想出了这个 function 工作正常,但我希望它被矢量化,我似乎无法避免 N² 的大 O 因子。 Are there any 'better' ways to do this?有没有“更好”的方法来做到这一点?

Standard function:标准 function:

threshold <- 3

calculateID <- function(p, r) {
    return((p-1) * threshold + r)
}

calculateID(1, 1) #returns 1
calculateID(1, 2) #returns 2
calculateID(1, 3) #returns 3
calculateID(2, 1) #returns 4
#.....
calculateID(5, 3) #returns 15

Vectorized function, I would like to give the two parameters as vectors so the function only has to be called once:矢量化 function,我想将这两个参数作为向量给出,因此 function 只需调用一次:

threshold <- 3
calculateIDVectorized <- function(p, r) {
    return(unlist(
        lapply(p, function(x) {
            lapply(r, function(y) {
                (x-1) * threshold + y
            })
        })
    ))
}

calculateIDVectorized(c(1, 2, 3, 4, 5), c(1, 2, 3)) # should return 1-15

To clarify: I want that every p and r argument is used so you should always get a result of length(p * r)澄清一下:我希望每个 p 和 r 参数都被使用,所以你应该总是得到长度的结果(p * r)

You can use outer :您可以使用outer

calculateIDVectorized <- function(p, r) as.vector(t(outer(p, r, calculateID)))

calculateIDVectorized(c(1, 2, 3, 4, 5), c(1, 2, 3))
#> [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

Another base R option using do.call + Vectorize + expand.grid另一个使用do.call + Vectorize + expand.grid基本 R 选项

> do.call(Vectorize(calculateID),unname(rev(expand.grid(r,p))))
 [1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

Data数据

p <- c(1, 2, 3, 4, 5)
r <- c(1, 2, 3)

Since the OP was interested in fast computation, I compared the solutions:由于 OP 对快速计算感兴趣,我比较了解决方案:

library(microbenchmark)

p <- c(1:500) # using larger data set
r <- c(1:20)

threshhold = length(r) # parameterizing threshold

m = microbenchmark(
tidy= crossing(p, r) %>% 
      rowwise %>% 
      transmute(out = calculateID(p, r)) %>%
      pull(out),

dcv = do.call(Vectorize(calculateID),unname(rev(expand.grid(r,p)))),

numbering = rev(expand.grid(r,p)) %>%
      arrange(Var2, Var1) %>%
      transmute(out = row_number()) %>%
      pull(out),

hybrid = rev(expand.grid(r,p)) %>%
      rowwise() %>%
      transmute(out = calculateID(Var2, Var1)) %>%
      pull(out),

outer = as.vector(t(outer(p, r, calculateID))),

outer_c = c(t(outer(p, r, calculateID))),

david = rep((p - 1), each = length(r)) * threshold + r
)
m
# Unit: microseconds
# expr       min        lq       mean     median         uq        max neval
# tidy 45441.869 47370.776 52123.6770 49482.1970 54158.4285 116780.840   100
# dcv 16259.935 17156.225 19641.6731 17897.8885 21576.0865  55489.586   100
# numbering  5947.147  6379.337  7127.5125  6576.3560  6952.3205  12005.854   100
# hybrid 44124.099 45856.210 51531.9480 47642.5405 52225.0600 175778.380   100
# outer   106.655   120.711   141.1137   128.9665   143.2465    265.072   100
# outer_c   117.811   137.446   152.5958   142.1315   155.9650    327.101   100
# david   223.125   230.711   257.5622   241.8675   260.6100    920.164   100

在此处输入图像描述

So it looks like the options using outer() are fastest with as.vector() edging out c() .所以看起来使用outer()的选项在as.vector() c()时最快。 @DavidArenburg's solution is also right up with the solutions using outer() . @DavidArenburg 的解决方案也与使用outer()的解决方案一致。

I added a hybrid option using dplyr::transmute() because rev(expand.grid()) was significantly faster thatn crossing() , which appears to be marginally faster than the straight dplyr route, but still not as fast as the do.call(Vectorize... or the others.我使用dplyr::transmute() transmute() 添加了一个混合选项,因为rev(expand.grid())crossing()快得多,这似乎比直线 dplyr 路线快一点,但仍然没有那么快。 call(Vectorize... 或其他。

another option (added above) would be to arrange the data frame and create id's using dplyr::row_number() or 1:nrow().另一个选项(上面添加)是排列数据框并使用 dplyr::row_number() 或 1:nrow() 创建 id。 This option would work if all the combinations for p and r are present and unique, but would fail with non-sequential values.如果 p 和 r 的所有组合都存在且唯一,则此选项将起作用,但会因非连续值而失败。

An option with tidyverse tidyverse的一个选项

library(dplyr)
library(tidyr)
crossing(p, r) %>% 
     rowwise %>% 
     transmute(out = calculateID(p, r)) %>%
     pull(out)
#[1]  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM