简体   繁体   English

计算数据帧的每一行与另一个数据帧中的所有其他行之间的欧几里德距离

[英]calculating the euclidean dist between each row of a dataframe with all other rows in another dataframe

I need to generate a dataframe with minimum euclidean distance between each row of a dataframe and all other rows of another dataframe.Both my dataframes are large (approx 40,000 rows).This is what I could work out till now. 我需要生成一个数据帧,其中数据帧的每一行与另一个数据帧的所有其他行之间的欧几里德距离最小。我的数据帧很大(大约40,000行)。这是我现在可以解决的问题。

x<-matrix(c(3,6,3,4,8),nrow=5,ncol=7,byrow = TRUE)     
y<-matrix(c(1,4,4,1,9),nrow=5,ncol=7,byrow = TRUE)


sed.dist<-numeric(5)
for (i in 1:(length(sed.dist))) {
sed.dist[i]<-(sqrt(sum((y[i,1:7] - x[i,1:7])^2)))
}

But this only works when i=j.What I essentially need is to find the minimum euclidean distance by looping over every row one by one ( y[1,1:7],then y[2,1:7] and so on till i= 5 ) of the "y" dataframe with all the rows of the "x"dataframe(x[i,1:7]).Each time it does this,I need it to find the minimum euclidean distance for each computation of row i of the y dataframe and all the rows of the x dataframe and store it in another dataframe. 但这只适用于i = j。我基本上需要的是通过逐行循环每一行来找到最小的欧氏距离(y [1,1:7],然后是y [2,1:7],依此类推直到i = 5)“y”数据帧与“x”数据帧的所有行(x [i,1:7])。每次这样做,我需要它来找到每个计算的最小欧几里德距离y数据帧的第i行和x数据帧的所有行,并将其存储在另一个数据帧中。

Try this: 尝试这个:

apply(y,1,function(y) min(apply(x,1,function(x,y)dist(rbind(x,y)),y)))
# [1] 5.196152 5.385165 4.898979 4.898979 5.385165

Working from the inside out, we bind a row of x to a row of y and calcualte the distance between them usin the dist(...) function (written in C). 从内到外,我们将一行x绑定到y行,并使用dist(...)函数(用C语言编写dist(...)它们之间的距离。 We do this for a given row of y with each row of x in turn, using the inner apply(...) , and then find the minimum of the result. 我们使用内部apply(...)依次对x的每一行执行此操作,然后找到结果的最小值。 Then we do this for each row of y in the outer call to apply(...) . 然后我们在外部调用中的每一行y执行此操作以apply(...)

Expanding on my comment on the question, a pretty fast approach would be the following, although with 40,000 rows you'll have to wait a bit, I guess: 扩展我对这个问题的评论,一个非常快速的方法将是以下,虽然有40,000行你将不得不等待,我想:

unlist(lapply(seq_len(nrow(y)), function(i) min(sqrt(colSums((y[i, ] - t(x))^2)))))
#[1] 5.196152 5.385165 4.898979 4.898979 5.385165

And a comparing benchmarking: 并进行比较基准测试:

x = matrix(runif(1e2*5), 1e2)
y = matrix(runif(1e2*5), 1e2)
library(microbenchmark)
alex = function() unlist(lapply(seq_len(nrow(y)), 
                           function(i) min(sqrt(colSums((y[i, ] - t(x))^2)))))
jlhoward = function() apply(y,1,function(y)
                                  min(apply(x,1,function(x,y)dist(rbind(x,y)),y)))
all.equal(alex(), jlhoward())
#[1] TRUE
microbenchmark(alex(), jlhoward(), times = 20)
#Unit: milliseconds
#       expr        min         lq     median         uq        max neval
#     alex()   3.369188   3.479011   3.600354   4.513114   4.789592    20
# jlhoward() 422.198621 431.565643 436.561057 442.643181 602.929742    20

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 计算一个数据帧的每一行与另一个数据帧中的所有其他行之间的欧氏距离,但是输出应该是哪一行 - calculating the euclidean dist between each row of a dataframe with all other rows in another dataframe, but out put should be which row 计算数据框中的每一行 - Calculating for each row in a dataframe R:如何计算 dataframe 的每个值与同一行中另一个 dataframe 的所有元素之间的差异? - R: How to calculate the difference between each value of a dataframe with all elements of another dataframe in the same row? 一个数据框实例与其他数据框实例的欧式距离 - euclidean distance of instances of one dataframe with all the instances of other dataframe 将一个数据帧的每一行乘以第二个数据帧的所有行 - Multiply each row of one dataframe by all rows of a second dataframe apply()并计算所有数据帧行的第一行比例 - apply() and calculating proportion of first row for all dataframe rows 如何为R中另一个数据帧的每个ID复制数据帧的所有行? - How to replicate all rows of a dataframe for each ID of another dataframe in R? 在 dataframe 中查找包含另一个 dataframe 行的所有元素的行 - Find rows in a dataframe which contain all elements of a row of another dataframe 计算R中数据帧中每两行之间的夹角(每行是一个向量)? - Calculate an angle between each 2 rows in a dataframe (each row is a vector) in R? 计算数据框各行之间以小时为单位的日期差异 - Calculating differences of dates in hours between rows of a dataframe
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM