简体   繁体   English

重复在向量中的元素与for循环

[英]Repeating elements in a vector with a for loop

I want to make a vector from 3:50 in R, looking like 我想在R中从3:50制作一个矢量,看起来像

3 4 4 5 6 6 7 8 8 .. 50 50 3 4 4 5 6 6 7 8 8 .. 50 50

I want to use a for loop in a for loop but it's not doing wat I want. 我想在for循环中使用for循环,但它不是我想要的wat。

f <- c()
for (i in 3:50) {
  for(j in 1:2) {
    f = c(f, i)
  }
}

What is wrong with it? 这有什么问题?

Another option is to use an embedded rep : 另一种选择是使用嵌入式rep

rep(3:50, rep(1:2, 24))

which gives: 这使:

  [1] 3 4 4 5 6 6 7 8 8 9 10 10 11 12 12 13 14 14 15 16 16 17 18 18 19 20 20 [28] 21 22 22 23 24 24 25 26 26 27 28 28 29 30 30 31 32 32 33 34 34 35 36 36 37 38 38 [55] 39 40 40 41 42 42 43 44 44 45 46 46 47 48 48 49 50 50 

This utilizes the fact that the times -argument of rep can also be an integer vector which is equal to the length of the x-argument. 这利用了reptimes -argument也可以是整数向量的事实,该整数向量等于x-argument的长度。

You can generalize this to: 您可以将此概括为:

s <- 3
e <- 50
v <- 1:2

rep(s:e, rep(v, (e-s+1)/2))

Even another option using a mix of rep and rep_len : 甚至使用reprep_len混合的另一种选择:

v <- 3:50
rep(v, rep_len(1:2, length(v)))

A solution based on sapply . 基于sapply的解决方案。

as.vector(sapply(0:23 * 2 + 2, function(x)  x + c(1, 2, 2)))

# [1]  3  4  4  5  6  6  7  8  8  9 10 10 11 12 12 13 14 14 15 16 16 17 18 18 19 20 20 21 22 22 23 24 24 25 26 26
# [37] 27 28 28 29 30 30 31 32 32 33 34 34 35 36 36 37 38 38 39 40 40 41 42 42 43 44 44 45 46 46 47 48 48 49 50 50

Benchmarking 标杆

Here is a comparison of performance for all the current answers. 以下是所有当前答案的性能比较。 The result shows that cumsum(rep(c(1, 1, 0), 24)) + 2L ( m8 ) is the fastest, while rep(3:50, rep(1:2, 24)) ( m1 ) is almost as fast as the m8 . 结果表明, cumsum(rep(c(1, 1, 0), 24)) + 2Lm8 )是最快的,而rep(3:50, rep(1:2, 24))m1 )几乎是和m8一样快。

library(microbenchmark)
library(ggplot2)

perf <- microbenchmark(
  m1 = {rep(3:50, rep(1:2, 24))},
  m2 = {rep(3:50, each = 2)[c(TRUE, FALSE, TRUE, TRUE)]},
  m3 = {v <- 3:50; sort(c(v,v[v %% 2 == 0]))},
  m4 = {as.vector(t(cbind(seq(3,49,2),seq(4,50,2),seq(4,50,2))))},
  m5 = {as.vector(sapply(0:23 * 2 + 2, function(x)  x + c(1, 2, 2)))},
  m6 = {sort(c(3:50, seq(4, 50, 2)))},
  m7 = {rep(seq(3, 50, 2), each=3) + c(0, 1, 1)},
  m8 = {cumsum(rep(c(1, 1, 0), 24)) + 2L},
  times = 10000L
)

perf
# Unit: nanoseconds
# expr   min    lq      mean median    uq     max neval
#   m1   514  1028  1344.980   1029  1542  190200 10000
#   m2  1542  2570  3083.716   3084  3085  191229 10000
#   m3 26217 30329 35593.596  31871 34442 5843267 10000
#   m4 43180 48321 56988.386  50891 55518 6626173 10000
#   m5 30843 35984 42077.543  37526 40611 6557289 10000
#   m6 40611 44209 50092.131  46779 50891  446714 10000
#   m7 13879 16449 19314.547  17478 19020 6309001 10000
#   m8     0  1028  1256.715   1028  1542   71454 10000

Use the rep function, along with the possibility to use recycling logical indexing ...[c(TRUE, FALSE, TRUE, TRUE)] 使用rep函数,以及使用循环逻辑索引的可能性...[c(TRUE, FALSE, TRUE, TRUE)]

rep(3:50, each = 2)[c(TRUE, FALSE, TRUE, TRUE)]

 ## [1]  3  4  4  5  6  6  7  8  8  9 10 10 11 12 12 13 14 14 15 16 16 17 18 18 19
## [26] 20 20 21 22 22 23 24 24 25 26 26 27 28 28 29 30 30 31 32 32 33 34 34 35 36
## [51] 36 37 38 38 39 40 40 41 42 42 43 44 44 45 46 46 47 48 48 49 50 50

If you use a logical vector ( TRUE / FALSE ) as index (inside [ ] ), a TRUE leads to selection of the corresponding element and a FALSE leads to omission. 如果使用逻辑向量( TRUE / FALSE )作为索引(在[ ] ),则TRUE会导致选择相应的元素,而FALSE会导致遗漏。 If the logical index vector ( c(TRUE, FALSE, TRUE, TRUE) ) is shorter than the indexed vector ( rep(3:50, each = 2) in your case), the index vector is recyled. 如果逻辑索引向量( c(TRUE, FALSE, TRUE, TRUE) )比索引向量( rep(3:50, each = 2)在你的情况下),则索引向量被重新计算。

Also a side note: Whenever you use R code like 另外还有一个注意事项:每当你使用R代码时

 x = c(x, something)

or 要么

 x = rbind(x, something)

or similar, you are adopting a C-like programming style in R. This makes your code unnessecarily complex and might lead to low performance and out-of-memory issues if you work with large (say, 200MB+) data sets. 或类似的,你在R中采用类似C的编程风格。这使得你的代码无法复杂,并且如果你使用大型(比方说,200MB +)数据集,可能会导致低性能和内存不足问题。 R is designed to spare you those low-level tinkering with data structures. R旨在为您提供数据结构的低级修补。

Read for more information about the gluttons and their punishment in the R Inferno , Circle 2: Growing Objects. 阅读有关R Inferno ,Circle 2:Growing Objects的馋嘴及其惩罚的更多信息。

The easiest way I can found is in way to create another one containing only even values (based on OP's intention) and then simply join two vectors. 我能找到的最简单的方法是创建另一个只包含even数值的方法(基于OP的意图),然后简单地连接两个向量。 The example could be: 例子可能是:

v <- 3:50
sort(c(v,v[v %% 2 == 0]))

# [1]  3  4  4  5  6  6  7  8  8  9 10 10 11 12 12 13 14 14 15 16 16
#      17 18 18 19 20 20 21 22 22 23 24 24 25 26 26 27 28 28
#[40] 29 30 30 31 32 32 33 34 34 35 36 36 37 38 38 39 40 40 41 42 42
#     43 44 44 45 46 46 47 48 48 49 50 50

Here is a loop-free 1 line solution: 这是一个无环路的1线解决方案:

> as.vector(t(cbind(seq(3,49,2),seq(4,50,2),seq(4,50,2))))
 [1]  3  4  4  5  6  6  7  8  8  9 10 10 11 12 12 13 14 14 15 16 16 17
[23] 18 18 19 20 20 21 22 22 23 24 24 25 26 26 27 28 28 29 30 30 31 32
[45] 32 33 34 34 35 36 36 37 38 38 39 40 40 41 42 42 43 44 44 45 46 46
[67] 47 48 48 49 50 50

It forms a matrix whose first column is the odd numbers in the range 3:50 and whose second and third columns are the even numbers in that range and then (by taking the transpose) reads it off row by row. 它形成一个矩阵,其第一列是3:50范围内的奇数,第二列和第三列是该范围内的偶数,然后(通过转置)逐行读取。

The problem with your nested loop approach is that the fundamental pattern is one of length 3, repeated 24 times (instead of a pattern of length 2 repeated 50 times). 嵌套循环方法的问题在于基本模式是长度为3的一个,重复24次(而不是长度为2的模式重复50次)。 If you wanted to use a nested loop, the outer loop could iterate 24 times and the inner loop 3. The first pass through the outer loop could construct 3,4,4. 如果你想使用嵌套循环,外循环可以迭代24次,内循环3.第一次通过外循环可以构造3,4,4。 The second pass could construct 5,6,6. 第二遍可以构建5,6,6。 Etc. Since there are 24*3 = 72 elements, you can pre-allocate the vector (by using f <- vector("numeric",74) ) so that you aren't growing it 1 element at a time. 等等。由于有24 * 3 = 72个元素,你可以预先分配向量(通过使用f <- vector("numeric",74) ),这样你就不会一次增长1个元素。 The idiom f <- c(f,i) that you are using at each stage copies all of the old elements just to create a new vector which is only 1 element longer. 你在每个阶段使用的成语f <- c(f,i)复制所有旧元素只是为了创建一个新元素,它只有1个元素。 Here there are too few elements for it to really make a difference, but if you try to create large vectors that way the performance can be shockingly bad. 这里的元素太少,无法真正发挥作用,但如果你尝试创建大型向量,那么性能可能会非常糟糕。

Here is a method that combines portions of a couple of the other answers. 这是一种结合了其他几个答案的部分的方法。

rep(seq(3, 50, 2), each=3) + c(0, 1, 1)
 [1]  3  4  4  5  6  6  7  8  8  9 10 10 11 12 12 13 14 14 15 16
[21] 16 17 18 18 19 20 20 21 22 22 23 24 24 25 26 26 27 28 28 29
[41] 30 30 31 32 32 33 34 34 35 36 36 37 38 38 39 40 40 41 42 42
[61] 43 44 44 45 46 46 47 48 48 49 50 50

Here is a second method using cumsum 这是使用cumsum的第二种方法

cumsum(rep(c(1, 1, 0), 24)) + 2L

This should be very quick. 这应该很快。

这也应该这样做。

sort(c(3:50, seq(4, 50, 2)))

Another idea, though not competing in speed with fastest solutions: 另一个想法,虽然没有与速度最快的解决方案竞争:

mat <- matrix(3:50,nrow=2)
c(rbind(mat,mat[2,]))
# [1]  3  4  4  5  6  6  7  8  8  9 10 10 11 12 12 13 14 14 15 16 16 17 18 18 19 20 20 21 22 22
# [31] 23 24 24 25 26 26 27 28 28 29 30 30 31 32 32 33 34 34 35 36 36 37 38 38 39 40 40 41 42 42
# [61] 43 44 44 45 46 46 47 48 48 49 50 50

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM