简体   繁体   English

R中的部分向量加法

[英]Partial vector addition in R

I have a vector that contains from 1 to 5 repeated values following by another such set that is usually, but not always, incremented by one. 我有一个包含1到5个重复值的向量,后面是另一个这样的集合,通常但不总是增加1。 For example, 例如,

c(1,1,1,1,1, 2,2,2,2, 3,3, 4,4,4,4,4)

I would like to operate on this in such a way as to add an increment of 0.2 to each value only when it is repeated giving 我想对此进行操作,只有在重复给出时才为每个值添加0.2的增量

c(1,1.2,1.4,1.6,1.8, 2,2.2,2.4,2.6, 3,3.2, 4,4.2,4.4,4.6,4.8)

I can do this very easily by using a for loop, but my initial vector is over 1 million entries long and that takes quite a long time. 我可以通过使用for循环很容易地做到这一点,但我的初始向量长度超过100万条,这需要相当长的时间。 I have been trying to come up with a list-based way of doing it without luck. 我一直试图想出一个没有运气的基于列表的方法。 Any suggestions would be appreciated. 任何建议,将不胜感激。

Here is an approach using rle and sequence to create the sequence 0,0.2,0.4,.... and this gets added to the original. 这是一种使用rle和sequence创建序列0,0.2,0.4,....并将其添加到原始序列中。

x <- c(1,1,1,1,1, 2,2,2,2, 3,3, 4,4,4,4,4)    
x + (sequence(rle(x)$lengths)-1)*0.2

Another ave possibility: ave可能性:

ave(
  dat,
  c(0,cumsum(diff(dat)!=0)),
  FUN=function(x) x + seq(0,(length(x)-1)*0.2,0.2)
)
#[1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 3.0 3.2 4.0 4.2 4.4 4.6 4.8

Here is one possibility (given the condition that there will never be more than set of each number and each number has at most 5 repetitions): 这是一种可能性(条件是每个数字的集合永远不会超过每个数字,每个数字最多重复5次):

myvec <- c(1,1,1,1,1, 2,2,2,2, 3,3, 4,4,4,4,4)
myvec + seq(0, .8, .2)[ave(myvec, myvec, FUN = seq_along)]
# [1] 1.0 1.2 1.4 1.6 1.8 2.0 2.2 2.4 2.6 3.0 3.2 4.0 4.2 4.4 4.6 4.8

For better alternatives when dealing with repeated numbers in your vector, see @mnel's and @thelatemail's answers.... 要在处理向量中的重复数字时有更好的选择,请参阅@ mnel和@thelatemail的答案....

This will probably be very quick on very large chains as well. 对于非常大的链条,这可能会非常快。

Edit - c_prev populated using head, and not tail. 编辑 - 使用head填充c_prev,而不是tail。 Thanks @Ricardosaporta for pointing it out 谢谢@Ricardosaporta指出来

library(data.table)

test <- data.table(
c1 = c(1,1,1,1,1, 2,2,2,2, 3,3, 4,4,4,4,4)
)

test[,c_prev := c(NA,head(c1,-1))]

test[, increment := 0.0]
test[c1 == c_prev , increment := 0.2]

test[, cumincrement := cumsum(increment), by = c1]

test[, revised_c := c1]
test[!is.na(cumincrement), revised_c := revised_c + cumincrement]

test
#    c1 c_prev increment cumincrement revised_c
# 1:  1     NA       0.0          0.0       1.0
# 2:  1      1       0.2          0.2       1.2
# 3:  1      1       0.2          0.4       1.4
# 4:  1      1       0.2          0.6       1.6
# 5:  1      1       0.2          0.8       1.8
# 6:  2      1       0.0          0.0       2.0
# 7:  2      2       0.2          0.2       2.2
# 8:  2      2       0.2          0.4       2.4
# 9:  2      2       0.2          0.6       2.6
#10:  3      2       0.0          0.0       3.0
#11:  3      3       0.2          0.2       3.2
#12:  4      3       0.0          0.0       4.0
#13:  4      4       0.2          0.2       4.2
#14:  4      4       0.2          0.4       4.4
#15:  4      4       0.2          0.6       4.6
#16:  4      4       0.2          0.8       4.8

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM