简体   繁体   English

使用R中的数据帧的两个向量的元素操作

[英]Element-wise operation with two vectors of a data frame in R

My first question here: how to apply an efficient routine that iterates values of two vectors (pairwise) of a given data frame? 我在这里的第一个问题:如何应用一个有效的例程来迭代给定数据帧的两个向量(成对)的值?

To be more specific, consider the following example using the following data frame: 更具体地说,请使用以下数据框考虑以下示例:

df0 <- data.frame(matrix(c(1,2,2,3,1,3,0.4,0.2,0.2,0.1,0.4,0.1),nrow=6,ncol=2))
colnames(df0) <- c("value","frequency")

The first column is a real value and the second column is a frequency (or weights). 第一列是实数值,第二列是频率(或权重)。 NOTICE: the weights have to be strictly positive, they might be repeated, they not necessarily add up to one (because of repetition). 注意:权重必须严格为正,它们可能会重复,它们不一定加起来(因为重复)。

I am performing the following LOOP to calculate my function P. This P is supposed to be a number between 0 and 1. 我正在执行以下LOOP来计算我的函数P.这个P应该是0到1之间的数字。

# Define two parameters
K = 1/2
alpha = 0

# LOOP
mattemp <- matrix(,nrow=length(df0$value), ncol=length(df0$value))

for(i in 1:length(df0$value)) {
  for(j in 1:length(df0$value)) {

    mattemp[i,j] <- df0$frequency[i]^(1+alpha) * df0$frequency[j] * abs(df0$value[i]-df0$value[j])

    P <- K * sum(mattemp)
  }
}

Basically, my function P is calculating: 基本上,我的函数P正在计算:

P = K * (0.4^alpha * 0.2 * |1-2| + 0.4^alpha * 0.1 * |1-3| + ...

This code works perfectly well as long as the matrix is small. 只要矩阵很小,此代码就能很好地工作。

However, I am trying to implement this routine for a big matrix (5400 x 5400) and this LOOP does not seem to find an end. 但是,我正在尝试为大矩阵(5400 x 5400)实现此例程,并且此LOOP似乎没有找到结束。

I already tried to loop it using a foreach command (using %dopar% ), but it does not work as well. 我已经尝试使用foreach命令(使用%dopar% )循环它,但它也不起作用。

Is there a smart and concise routine that R can handle??? 是否有一个聪明而简洁的例程,R可以处理??? It does not need to follow the above structure, as long as it is efficient. 只要它是有效的,它不需要遵循上述结构。

Thank you very much 非常感谢你

Try: 尝试:

df$nval <- (df0$value - mean(df0$value)) / sd(df0$value)
ij <- combn(nrow(df0), 2)
foo <- sum(df0$frequency[ij[1, ]] ^ (1 + alpha) * df0$frequency[ij[2, ]] * abs(df0$nval[ij[1, ]] - df0$nval[ij[2, ]]))
P <- K*2*sum(foo)

Reasoning : Basically you are testing every possible permutation between frequencies and normalized values. 推理 :基本上,您正在测试频率和标准化值之间的所有可能的排列。 We use combn to create half of those. 我们使用combn来创建其中的一半。 We then just vectorize the whole thing. 然后,我们只是将整个事物矢量化。 Since combn only gives unique combinations, we need to multiply by 2. [Keep in mind that we don't need the values on the diagonal, as abs(df0$value[i] - df0$value[i]) is equal to 0 , and we are only missing cases where i=j and j=i , so that's why we multiply by 2.] We then multiply by K and get P. 由于combn只给出了唯一的组合,我们需要乘以2. [请记住,我们不需要对角线上的值,因为abs(df0$value[i] - df0$value[i])等于0 ,我们只缺少i=jj=i ,这就是我们乘以2的原因。]然后我们乘以K得到P.

It's not clear how you want to normalize, so I just substracted the mean and divided that by the standard deviation. 目前尚不清楚你想要如何规范化,所以我只是减去均值并将其除以标准差。 If you meant something else, you yourself can change it accordingly. 如果你的意思是别的,你自己可以相应地改变它。

Edit1 : Big thanks to @alexis_laz for finding a mistake and suggesting improvements that almost double the speed! 编辑1 :非常感谢@alexis_laz找到了一个错误并提出了几乎加倍速度的改进!

Edit2 : Adjusted script to fit changed requirements. Edit2 :调整后的脚本以适应更改的要求。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM