简体   繁体   English

减少 R 中具有大向量的二元 ecdf 的计算时间

[英]Reduce computation time of bivariate ecdf with large vectors in R

I would like to calculate the bivariate empirical cumulative density function for two very large vectors (over 250 million elements) to calculate the percentage for each pair of values i:n with a for-loop and store it in a result vector.我想计算两个非常大的向量(超过 2.5 亿个元素)的二元经验累积密度 function,以使用 for 循环计算每对值 i:n 的百分比并将其存储在结果向量中。 Due to the length of the two vectors it is already obvious that the calculation time would be extremely long, so I would like to translate my for-loop into rcpp.由于两个向量的长度,很明显计算时间会非常长,所以我想将我的 for 循环转换为 rcpp。

# minimal working example

vec_a <- runif(1e+4)
vec_b <- rnorm(1e+4)
total <- length(vec_b)
store <- vector()

for(i in 1:total){store[i] <- sum(vec_a <= vec_a[i] & vec_b <= vec_b[i])/total}

I tried to translate my loop, but since I just started to work with rcpp, some things are not quite clear to me.我试图翻译我的循环,但由于我刚开始使用 rcpp,有些事情对我来说不是很清楚。 I would be happy if someone could give me an answer a.) why the results are not identical and b.) if it would be possible to speed up the rcpp code.如果有人能给我一个答案,我会很高兴 a.)为什么结果不一样 b.)如果可以加快 rcpp 代码的速度。

# Rcpp protoype
library(Rcpp)
cppFunction(
  "NumericVector FasterLoop(NumericVector x, NumericVector y) {
  const int n = x.length();
  NumericVector z(n);
  for (int i=0; i < n; ++i) {
   z[i] = sum(x <= x[i] & y <= y[i])/n;
  }
  return z;
}")

proto <- FasterLoop(vec_a, vec_b)

The problem is that sum(x <= x[i] & y <= y[i]) returns an integer, and then sum(x <= x[i] & y <= y[i])/n performs an integer division.问题是sum(x <= x[i] & y <= y[i])返回一个 integer,然后sum(x <= x[i] & y <= y[i])/n执行一个integer划分。 You have to cast sum(x <= x[i] & y <= y[i]) to a double .您必须将sum(x <= x[i] & y <= y[i])转换为double This is automatically done by doing z[i] = sum(x <= x[i] & y <= y[i]) and then by dividing z[i] by n .这是通过z[i] = sum(x <= x[i] & y <= y[i])然后将z[i]除以n自动完成的。

library(Rcpp)
cppFunction(
  "NumericVector FasterLoop(NumericVector x, NumericVector y) {
  const int n = x.length();
  NumericVector z(n);
  for (int i=0; i < n; ++i) {
   z[i] = sum(x <= x[i] & y <= y[i]);
  }
  return z/n;
}")

FasterLoop(c(1,2,3), c(1,2,3))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM