简体   繁体   English

有效利用载体

[英]Efficient use of vectors

I am attempting to copy one vector to another using the following syntax: 我正在尝试使用以下语法将一个向量复制到另一个向量:

data<-NULL
for( i in 1:nrow(line)){
  data=append(data,line[i*4])
}

From what I have seen, the use of append in this way results in a lot of copying of data, which makes R very slow. 据我append以这种方式使用append导致大量数据复制,这使R非常慢。 What is the syntax for copying the 4th element of one array to another, given that the list you are copying from is of a given size? 假设要复制的列表具有给定的大小,将一个数组的第4个元素复制到另一个数组的语法是什么?

If you're trying to extract every fourth element from a vector, you could index using seq to grab the correct elements: 如果您尝试从向量中提取每四个元素,则可以使用seq进行索引以获取正确的元素:

data <- letters[seq(4, length(letters), by=4)]
data
# [1] "d" "h" "l" "p" "t" "x"

Growing the vector one at a time as you show in your question will be slow because you will need to keep re-allocating your vector (see the second circle of The R Inferno for details). 如您在问题中所示,一次增加一个向量会很慢,因为您将需要继续重新分配向量(有关详细信息,请参见The R Inferno的第二个圆圈)。 However, even pre-allocating your vector and constructing it with a for loop will be slow compared to constructing it in a single vectorized indexing operation. 但是,与在单个向量化索引操作中构造向量相比,即使预先分配向量并使用for循环构造向量也会很慢。

To get a sense of the speed improvements, consider a comparison to the sort of method you've described, except using pre-allocation: 为了了解速度的提高,请考虑与您描述的方法进行比较,但使用预分配除外:

for.prealloc <- function(x) {
  data <- vector(mode="numeric", length = floor(length(x)/4))
  for (i in 1:floor(length(x)/4)) {
    data[i] <- x[i*4]
  }
  data
}
josilber <- function(x) x[seq(4, length(x), by=4)]
r <- rnorm(10000)
all.equal(for.prealloc(r), josilber(r))
# [1] TRUE

library(microbenchmark)
microbenchmark(for.prealloc(r), josilber(r))
# Unit: microseconds
#             expr      min        lq      mean   median      uq      max neval
#  for.prealloc(r) 1846.014 2035.7890 2351.9681 2094.804 2244.56 5283.285   100
#      josilber(r)   95.757   97.4125  125.9877  113.179  138.96  259.606   100

The approach I propose is 20x faster than using for and a pre-allocated vector (and it will be even faster than using append and a non-pre-allocated vector). 我提出的方法比for和预分配矢量要快20倍(它甚至比使用append和非预分配矢量要快)。

Here are three methods with their benchmarks. 这是三种方法及其基准。 You can see that preallocating the vector as in the method2 function is quite a bit faster, while the lapply method is middle, and your function is the slowest. 您可以看到像method2函数中那样预先分配向量要快得多,而lapply方法是中途的,而函数则是最慢的。

Of course, these are 1D vectors as opposed to arrays of nD, but I would expected the benchmarks would be similar or even more pronounced. 当然,这些是一维向量,而不是nD数组,但是我希望基准测试结果会相似甚至更为明显。

method1 <- function(line) {
  data<-NULL
  for( i in 1:length(line)){
    data=append(data,line[i])
  }
}

method2 <- function(line) {
  data <- vector(mode="numeric", length = length(line))
  for (i in 1:length(line)) {
    data[i] <- line[i]
  }
}

library(microbenchmark)
r <- rnorm(1000)
microbenchmark(method2(r), unit="ms")
#> Unit: milliseconds
#>        expr     min       lq     mean   median       uq     max neval
#>  method2(r) 2.18085 2.279676 2.428731 2.371593 2.500495 5.24888   100
microbenchmark(lapply(r, function(x) { data<-append(data, x) }), unit="ms")
#> Unit: milliseconds
#>                                                    expr      min       lq
#>  lapply(r, function(x) {     data <- append(data, x) }) 3.014673 3.091299
#>      mean   median       uq      max neval
#>  3.287216 3.150052 3.260199 6.036501   100
microbenchmark(method1(r), unit="ms")
#> Unit: milliseconds
#>        expr      min       lq    mean   median       uq      max neval
#>  method1(r) 3.938684 3.978002 5.71831 4.020001 4.280521 98.58584   100

Didn't realize OP wanted only every fourth. 没意识到OP只想要四分之一。 Why not just use a data frame or data.table? 为什么不只使用数据帧或data.table?

d <- data.frame(matrix(rnorm(1000), ncol=1))
microbenchmark(d2 <- d[seq(1,nrow(d), 4),])
#> Unit: microseconds
#>                           expr    min      lq     mean median      uq
#>  d2 <- d[seq(1, nrow(d), 4), ] 64.846 65.9915 73.08007 67.225 73.8225
#>      max neval
#>  220.438   100
library(data.table)
dt <- data.table(d)
microbenchmark(d2 <- dt[seq(1,nrow(d), 4),])
#> Unit: microseconds
#>                            expr     min       lq     mean  median      uq
#>  d2 <- dt[seq(1, nrow(d), 4), ] 298.163 315.2025 324.8793 320.554 330.416
#>      max neval
#>  655.124   100

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM