简体   繁体   English

C ++或Rcpp:无循环的两个向量的比较

[英]C++ or Rcpp: comparison of two vectors without loop

I am a novice in C++ and Rcpp, and I am wondering how to compare each element of two different vectors without loop at one time. 我是C ++和Rcpp的新手,我想知道如何一次比较两个不同向量的每个元素而没有循环。

My goal is to change the element of v1 by referencing other vector.` 我的目标是通过引用其他向量来更改v1的元素。

Current code is 当前代码是

v1 = {6,7,8,9,10}
v2 = {2,4,6,8,10}
v3 = {a,b,a,b,c}
v4 = {0,0,0,0,0}
v5 = {a,b,c}
v6 = {1,2,3}

for (i in 1:5){
  if (v1[i] > v2[i]){
    for (j in 1:3){
      if (v5[j] == v3[i]){
        v4[i] = v2[i] + v6[j]
          if (v1[i] > v4[i]){
            v1[i] = v4[i]
          }
      }
    }
  }
}  

The result sould be 结果是

v1 = {3,6,7,9,10}

In fact, v1, v2, v3, v4 and v5, v6 are the different dataframe in R. Each element of v1 is compared to v2 . 事实上, v1, v2, v3, v4v5, v6是不同dataframe在R的每个元素v1进行比较, v2 If an element i in v1 is larger than i element in v2 , the element of v1 becomes a sum of i element of v1 and element of v6 by corresponding v3 & v5 . 如果一个元素iv1大于i在元件v2 ,的元件v1变成的总和i的元件v1和元件v6由对应v3v5 Then the newly estimated value v4[i] is compared to v1[i] . 然后将新估计的值v4[i]v1[i]

I have ta large number of cases in v1~v5 and v5~v6 . 我在v1~v5v5~v6有很多案件。 In this case, using loop takes a long time. 在这种情况下,使用loop会花费很长时间。 Is it possible to compare the different vectors without loop? 是否可以比较不同的向量而没有循环? or how to estimate and reference the other vector's element? 或如何估计和引用另一个向量的元素?

I do not see the need to use Rcpp or C++ here. 我在这里看不到需要使用Rcpp或C ++。 The way I understand your requirements, you are trying to manipulate two sets of equal length vectors. 以我理解您的要求的方式,您正在尝试操纵两组等长向量。 For a "set of equal length" vectors one normally uses a data.frame or one of its extensions. 对于“等长集”向量,通常使用data.frame或其扩展名之一。 Here I am using base R, data.table and dplyr with tibble . 在这里我使用的基础R, data.tabledplyrtibble See for yourself which syntax you prefer. 亲自查看您喜欢哪种语法。 Generally speaking, data.table will most likely be faster for large data sets. 一般来说,对于大型数据集, data.table最有可能会更快。

Setup data: 设置数据:

v1 <- c(6,7,8,9,10)
v2 <- c(2,4,6,8,10)
v3 <- c("a","b","a","b","c")
v5 <- c("a","b","c")
v6 <- c(1,2,3)

Base R: 基数R:

df1 <- data.frame(v1, v2, v3)
df2 <- data.frame(v5, v6)

df1 <- merge(df1, df2, by.x = "v3", by = "v5")
df1$v4 <- df1$v2 + df1$v6
df1$v1 <- ifelse(df1$v1 > df1$v2 & df1$v1 > df1$v4, df1[["v4"]], df1[["v1"]])
df1
#>   v3 v1 v2 v6 v4
#> 1  a  3  2  1  3
#> 2  a  7  6  1  7
#> 3  b  6  4  2  6
#> 4  b  9  8  2 10
#> 5  c 10 10  3 13

data.table : data.table

library(data.table)
dt1 <- data.table(v1, v2, v3, key = "v3")
dt2 <- data.table(v5, v6, key = "v5")

dt1[dt2, v4 := v2 + v6]
dt1[v1 > v2 & v1 > v4, v1 := v4]
dt1
#>    v1 v2 v3 v4
#> 1:  3  2  a  3
#> 2:  7  6  a  7
#> 3:  6  4  b  6
#> 4:  9  8  b 10
#> 5: 10 10  c 13

dplyr : dplyr

suppressPackageStartupMessages(library(dplyr))
t1 <- tibble(v1, v2, v3)
t2 <- tibble(v5, v6)
t1 %>% 
  inner_join(t2, by = c("v3" = "v5")) %>%
  mutate(v4 = v2 + v6) %>%
  mutate(v1 = case_when(
    v1 > v2 & v1 > v4 ~ v4,
    TRUE ~ v1
  ))
#> # A tibble: 5 x 5
#>      v1    v2 v3       v6    v4
#>   <dbl> <dbl> <chr> <dbl> <dbl>
#> 1     3     2 a         1     3
#> 2     6     4 b         2     6
#> 3     7     6 a         1     7
#> 4     9     8 b         2    10
#> 5    10    10 c         3    13

Created on 2019-04-19 by the reprex package (v0.2.1) reprex软件包 (v0.2.1)创建于2019-04-19

The general idea is always the same: 总体思路始终是相同的:

  • join the two tables on the character column 连接字符列上的两个表
  • create new column v4 as sum of v2 and v6 创建新列v4作为v2v6总和
  • update v1 to the value of v4 where v1 > v2 and v1 > v4 v1更新为v4的值,其中v1 > v2v1 > v4

Note that base R and data.table do not preserve the order, so it would make more sense to put the output into an additional column. 请注意,基数R和data.table不保留顺序,因此将输出放置到其他列中会更有意义。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM