[英]Efficient use of vectors
I am attempting to copy one vector to another using the following syntax: 我正在尝试使用以下语法将一个向量复制到另一个向量:
data<-NULL
for( i in 1:nrow(line)){
data=append(data,line[i*4])
}
From what I have seen, the use of append
in this way results in a lot of copying of data, which makes R very slow. 据我append
以这种方式使用append
导致大量数据复制,这使R非常慢。 What is the syntax for copying the 4th element of one array to another, given that the list you are copying from is of a given size? 假设要复制的列表具有给定的大小,将一个数组的第4个元素复制到另一个数组的语法是什么?
If you're trying to extract every fourth element from a vector, you could index using seq
to grab the correct elements: 如果您尝试从向量中提取每四个元素,则可以使用seq
进行索引以获取正确的元素:
data <- letters[seq(4, length(letters), by=4)]
data
# [1] "d" "h" "l" "p" "t" "x"
Growing the vector one at a time as you show in your question will be slow because you will need to keep re-allocating your vector (see the second circle of The R Inferno for details). 如您在问题中所示,一次增加一个向量会很慢,因为您将需要继续重新分配向量(有关详细信息,请参见The R Inferno的第二个圆圈)。 However, even pre-allocating your vector and constructing it with a for loop will be slow compared to constructing it in a single vectorized indexing operation. 但是,与在单个向量化索引操作中构造向量相比,即使预先分配向量并使用for循环构造向量也会很慢。
To get a sense of the speed improvements, consider a comparison to the sort of method you've described, except using pre-allocation: 为了了解速度的提高,请考虑与您描述的方法进行比较,但使用预分配除外:
for.prealloc <- function(x) {
data <- vector(mode="numeric", length = floor(length(x)/4))
for (i in 1:floor(length(x)/4)) {
data[i] <- x[i*4]
}
data
}
josilber <- function(x) x[seq(4, length(x), by=4)]
r <- rnorm(10000)
all.equal(for.prealloc(r), josilber(r))
# [1] TRUE
library(microbenchmark)
microbenchmark(for.prealloc(r), josilber(r))
# Unit: microseconds
# expr min lq mean median uq max neval
# for.prealloc(r) 1846.014 2035.7890 2351.9681 2094.804 2244.56 5283.285 100
# josilber(r) 95.757 97.4125 125.9877 113.179 138.96 259.606 100
The approach I propose is 20x faster than using for
and a pre-allocated vector (and it will be even faster than using append
and a non-pre-allocated vector). 我提出的方法比for
和预分配矢量要快20倍(它甚至比使用append
和非预分配矢量要快)。
Here are three methods with their benchmarks. 这是三种方法及其基准。 You can see that preallocating the vector as in the method2
function is quite a bit faster, while the lapply method is middle, and your function is the slowest. 您可以看到像method2
函数中那样预先分配向量要快得多,而lapply方法是中途的,而函数则是最慢的。
Of course, these are 1D vectors as opposed to arrays of nD, but I would expected the benchmarks would be similar or even more pronounced. 当然,这些是一维向量,而不是nD数组,但是我希望基准测试结果会相似甚至更为明显。
method1 <- function(line) {
data<-NULL
for( i in 1:length(line)){
data=append(data,line[i])
}
}
method2 <- function(line) {
data <- vector(mode="numeric", length = length(line))
for (i in 1:length(line)) {
data[i] <- line[i]
}
}
library(microbenchmark)
r <- rnorm(1000)
microbenchmark(method2(r), unit="ms")
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> method2(r) 2.18085 2.279676 2.428731 2.371593 2.500495 5.24888 100
microbenchmark(lapply(r, function(x) { data<-append(data, x) }), unit="ms")
#> Unit: milliseconds
#> expr min lq
#> lapply(r, function(x) { data <- append(data, x) }) 3.014673 3.091299
#> mean median uq max neval
#> 3.287216 3.150052 3.260199 6.036501 100
microbenchmark(method1(r), unit="ms")
#> Unit: milliseconds
#> expr min lq mean median uq max neval
#> method1(r) 3.938684 3.978002 5.71831 4.020001 4.280521 98.58584 100
Didn't realize OP wanted only every fourth. 没意识到OP只想要四分之一。 Why not just use a data frame or data.table? 为什么不只使用数据帧或data.table?
d <- data.frame(matrix(rnorm(1000), ncol=1))
microbenchmark(d2 <- d[seq(1,nrow(d), 4),])
#> Unit: microseconds
#> expr min lq mean median uq
#> d2 <- d[seq(1, nrow(d), 4), ] 64.846 65.9915 73.08007 67.225 73.8225
#> max neval
#> 220.438 100
library(data.table)
dt <- data.table(d)
microbenchmark(d2 <- dt[seq(1,nrow(d), 4),])
#> Unit: microseconds
#> expr min lq mean median uq
#> d2 <- dt[seq(1, nrow(d), 4), ] 298.163 315.2025 324.8793 320.554 330.416
#> max neval
#> 655.124 100
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.