简体   繁体   English

R中的欧几里德距离在矩阵中使用两个变量

[英]Euclidean distance in R using two variables in a matrix

I am quite new to R and I am trying to compute the gross distance (or the sum of the Euclidean distance on all data points) from two variables in my matrix and net distance (Euclidean distance between the first and last point of my data. So just a background on my data. My data is normally a csv file comprising of 5 variables: tracks of cells (called A), time interval, X and Y position of each cell, V=velocity. There is around 90 tracks per data and each track should be treated independent of each other. 我是R的新手,我正在尝试从我的矩阵中的两个变量和净距离(我的数据的第一个和最后一个点之间的欧几里德距离)计算总距离(或所有数据点上欧几里得距离的总和)。所以只是我数据的背景。我的数据通常是一个包含5个变量的csv文件:单元格轨迹(称为A),时间间隔,每个单元格的X和Y位置,V =速度。每个数据大约有90个轨道并且每个轨道应该彼此独立地对待。

dput(head(t1))
structure(list(A = c(0L, 0L, 0L, 0L, 0L, 0L), T = 0:5, X = c(668L, 
668L, 668L, 668L, 668L, 668L), Y = c(259L, 259L, 259L, 259L, 
259L, 259L), V = c(NA, 0, 0, 0, 0, 0)), .Names = c("A", "T", 
"X", "Y", "V"), row.names = c(NA, 6L), class = "data.frame")

I was not aware of the dist() function before, so I made my own function: 之前我没有意识到dist()函数,所以我创建了自己的函数:

GD.data <- function (trackdata)
{A= trackdata(, 1); V=trackdata(, 5);
 for (i in min(A):max(A))
   while (A<=i) {GD(i) = (sum (V)*(1/25))
                 return (GD(i))} 

This did not work. 这没用。 I used A as an identifier of the track and since gross distance could be also computed as: distance=velocity (t1-t0), I just did summation of all velocity times my time interval (since it is constantly 1/25 secs. 我使用A作为轨道的标识符,因为总距离也可以计算为:距离=速度(t1-t0),我只是对所有速度乘以我的时间间隔求和(因为它始终是1/25秒)。

How do I use the dist() function with my A as identifier? 如何使用我的A作为标识符的dist()函数? I need this since the computation of each track should be separate. 我需要这个,因为每个轨道的计算应该是分开的。 Thanks! 谢谢!

Since you have velocity measured at constant time intervals, which you can sum over to get the total euclidean distance moved, you can actually just use the base R function aggregate to sum the V data by each track identifier A , which is what the command below does: 由于你有以恒定时间间隔测量的速度,你可以总结以获得移动的欧几里德总距离,你实际上只需使用base R函数aggregate来按每个轨道标识符AV数据求和,这就是下面的命令作用:

aggregate( V ~ A , data = t1 , sum , na.rm = TRUE )

Basically this says, aggregate V for each value of A. The aggregation function is sum (you can imagine this could easily be the mean velocity for each track by using mean instead of sum). 基本上,这表示, aggregate的A的每个值V的聚合函数sum (你能想象这很容易被mean利用每个轨道速度mean ,而不是总和)。 We pass an additional argument to sum which is na.rm , telling it to ignore NAs in the data (which I assume are at t = 0 for each track). 我们将另一个参数传递给sum ,即na.rm ,告诉它忽略数据中的na.rm (我假设每个轨道的t = 0 )。

Calculating 'as the crow flies' distance between first and last position by track: 按轨道计算第一个和最后一个位置之间的“乌鸦飞行”距离:

For this we can split the dataframe into sub-dataframes by the track identifier A and then operate on each subset of the data, using lapply to apply a simple hypotenuse calculation to the first and last row of each sub-dataframe. 为此,我们可以通过轨道标识符A将数据帧split为子数据帧,然后对数据的每个子集进行操作,使用lapply将简单的斜边计算应用于每个子数据帧的第一行和最后一行。

## Split the data
dfs <- split(t1,t1$A)

## Find hypotenuse between first and last rows for each A
lapply( dfs , function(x){
  j <- nrow(x)
  str <- x[1,c("X","Y")]
  end <- x[j,c("X","Y")]
  dist <- sqrt( sum( (end - str)^2 ) )
  return( dist )
} )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM