简体   繁体   English

R中的马哈洛诺比斯距离,误差:系统在计算上是奇异的

[英]Mahalonobis distance in R, error: system is computationally singular

I'd like to calculate multivariate distance from a set of points to the centroid of those points.我想计算从一组点到这些点的质心的多元距离。 Mahalanobis distance seems to be suited for this.马哈拉诺比斯距离似乎适合这一点。 However, I get an error (see below).但是,我收到一个错误(见下文)。

Can anyone tell me why I am getting this error, and if there is a way to work around it?谁能告诉我为什么会出现此错误,以及是否有解决方法?

If you download the coordinate data and the associated environmental data , you can run the following code.如果下载坐标数据相关的环境数据,则可以运行以下代码。

require(maptools)
occ <- readShapeSpatial('occurrences.shp')
load('envDat.Rdata')

#standardize the data to scale the variables
dat <- as.matrix(scale(dat))
centroid <- dat[1547,]  #let's assume this is the centroid in this case

#Calculate multivariate distance from all points to centroid
mahalanobis(dat,center=centroid,cov=cov(dat))

Error in solve.default(cov, ...) : 
  system is computationally singular: reciprocal condition number = 9.50116e-19

The Mahalanobis distance requires you to calculate the inverse of the covariance matrix. Mahalanobis 距离要求您计算协方差矩阵的逆矩阵。 The function mahalanobis internally uses solve which is a numerical way to calculate the inverse.函数mahalanobis内部使用solve ,这是一种计算逆的数值方法。 Unfortunately, if some of the numbers used in the inverse calculation are very small, it assumes that they are zero, leading to the assumption that it is a singular matrix.不幸的是,如果逆计算中使用的一些数字非常小,它会假设它们为零,从而导致假设它是奇异矩阵。 This is why it specifies that they are computationally singular, because the matrix might not be singular given a different tolerance.这就是为什么它指定它们在计算上是奇异的,因为给定不同的容差,矩阵可能不是奇异的。

The solution is to set the tolerance for when it assumes that they are zero.解决方案是在假设它们为零时设置容差。 Fortunately, mahalanobis allows you to pass this parameter ( tol ) to solve :幸运的是, mahalanobis允许你传递这个参数( tol )来solve

mahalanobis(dat,center=centroid,cov=cov(dat),tol=1e-20)
# [1] 24.215494 28.394913  6.984101 28.004975 11.095357 14.401967 ...

mahalanobis uses the covariance matrix, cov, (more precisely the inverse of it) to transform the coordinate system, then compute Euclidian distance in the new coordinates. mahalanobis 使用协方差矩阵 cov(更准确地说是它的逆矩阵)来变换坐标系,然后在新坐标中计算欧几里得距离。 A standard reference is Duda & Hart "Pattern Classification and Scene Recognition"标准参考是 Duda & Hart “模式分类和场景识别”

Looks like your cov matrix is singular.看起来你的 cov 矩阵是奇异的。 Perhaps there are linearly-dependent columns in "dat" that are unnecessary?也许“dat”中有不必要的线性相关列? Setting the tolerance to zero won't help if the covariance matrix is truly singular.如果协方差矩阵确实是奇异的,则将容差设置为零将无济于事。 The first thing to do, instead, is look for columns that might be a rescaling of some other column, or might be just a sum of 2 or more other columns and remove them.相反,首先要做的是查找可能是其他某些列的重新缩放的列,或者可能只是 2 个或更多其他列的总和,然后将其删除。 Such columns are redundant for the mahalanobis distance.对于 mahalanobis 距离,这些列是多余的。

BTW, since mahalanobis distance is effectively a rescaling and rotation, calling the scaling function looks superfluous - any reason why you want that?顺便说一句,由于马氏距离实际上是重新缩放和旋转,调用缩放函数看起来是多余的 - 你有什么理由想要这样做吗?

谢谢你,对我来说是如此有用

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM