简体   繁体   English

子集中两组之间的最短欧式距离

[英]Shortest Euclidean distance between two groups in subset

I have a largish data frame (50000 points) representing points in 2D collected from biological images. 我有一个较大的数据框(50000点),代表从生物学图像中收集的2D点。 Points are categorised as either red or green and are associated with each other in groups (in the example: cells AD). 点被分类为红色或绿色,并且在组中彼此关联(在示例中:单元格AD)。 A small test data set (MSR_test.csv) can be found here . 此处可以找到一个小的测试数据集(MSR_test.csv)。

require(ggplot2)
cells <- read.csv("MSR_test.csv")
ggplot(cells, aes(X, Y, colour = channel, shape = cell)) + 
   geom_point() +
   scale_colour_manual(values = c("green","red"), name = "channel")

I am trying to find a reasonably straight forward way (perhaps involving plyr ?) to find the Euclidean distance between each green point and its nearest red point within the same 'cell group'. 我试图找到一种合理的直接方法(也许涉及plyr ?)来找到同一“单元组”中每个绿色点与其最近的红色点之间的欧几里得距离。 Whilst I think I have worked out how to do this for an individual grouping (using rdist from package fields ) I can't seem to work out how to apply a method to my data frame. 虽然我认为我已经解决了如何针对单个分组进行此操作(使用来自软件包fields rdist ),但似乎rdist如何将方法应用于数据框的问题。

I don't see any reason to use plyr , but maybe I'm wrong. 我看不到使用plyr任何理由,但也许我错了。 The following code works on your example. 以下代码适用于您的示例。 I did not use any heavy function to compute the Euclidean distance, mainly because you may have to compute it on a lot of points. 我没有使用任何繁重的函数来计算欧几里得距离,主要是因为您可能必须在很多点上进行计算。

green <- subset(cells, channel=="Green")
red <- subset(cells, channel=="Red")
fun_dist <- function(a, M) rowSums( (M - matrix(1,nrow(M),1) %*% as.numeric(a))**2 )
foo <- function(greenrow, matred) {
  subred <- subset(matred, cell == greenrow["cell"], select=c("X","Y"))
  minred <- subred[ which.min(fun_dist(unlist(greenrow[c("X","Y")]),subred)), ]
  return(minred)
}
data.frame( "rbind", apply(green, 1, foo, red) )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM