[英]Implementing Mahalanobis Distance from scratch in python
我從頭開始實施Mahalanobis Distance但發生了錯誤。 馬哈拉諾比斯距離的公式是 - 我在下面提供我的代碼,錯誤 -
from math import*
from decimal import Decimal
import numpy as np
def mahalanobis(x, y, cov=None):
x_mean = np.mean(x)
y_mean = np.mean(y)
y_minus_mn = y - y_mean
x_minus_mn_with_transpose =np.transpose(x- x_mean)
Covariance = covar(x, y)
inv_covmat = np.linalg.inv(Covariance)
x_minus_mn = x - x_mean
D_square = np.dot( x_minus_mn_with_transpose, inv_covmat, x_minus_mn)
return D_square
def covar(x, y):
x_mean = np.mean(x)
y_mean = np.mean(y)
Cov_numerator = sum(((a - x_mean)*(b - y_mean)) for a, b in zip(x, y))
Cov_denomerator = len(x) - 1
Covariance = (Cov_numerator / Cov_denomerator)
return Covariance
import pandas as pd
filepath = 'https://raw.githubusercontent.com/selva86/datasets/master/diamonds.csv'
df = pd.read_csv(filepath).iloc[:, [0,4,6]]
df.head()
X = df[['carat', 'depth', 'price']].head(500).values.tolist
Y =df[['carat', 'depth', 'price']].values.tolist
mahalanobis(X, Y)
Plz的幫助。 是否有人可以檢查和更正我的代碼
X = df[['carat', 'depth', 'price']].head(500).values.tolist
Y =df[['carat', 'depth', 'price']].values.tolist
.tolist
它的功能。 我想你需要:
.tolist()
我要指出,您的代碼中存在許多錯誤
使用np.cov在使用numpy數組時計算協方差,不要重新實現所有內容
np.dot的第三個參數是輸出,所以你應該寫D_square = np.dot(np.dot(x_minus_mn, inv_covmat), np.transpose(x_minus_mn))
而不是D_square = np.dot( x_minus_mn_with_transpose, inv_covmat, x_minus_mn)
D_square = np.dot(np.dot(x_minus_mn, inv_covmat), np.transpose(x_minus_mn))
而不是X = df[['carat', 'depth', 'price']].head(500).values.tolist
使用X = np.asarray(df[['carat', 'depth', 'price']].head(500).values)
。 如果你使用numpy然后只使用numpy數組,而不是列表。
這是您提供的代碼的修改版本
import numpy as np
def mahalanobis(x, y, cov=None):
x_mean = np.mean(x)
Covariance = np.cov(np.transpose(y))
inv_covmat = np.linalg.inv(Covariance)
x_minus_mn = x - x_mean
D_square = np.dot(np.dot(x_minus_mn, inv_covmat), np.transpose(x_minus_mn))
return D_square
import pandas as pd
filepath = 'https://raw.githubusercontent.com/selva86/datasets/master/diamonds.csv'
df = pd.read_csv(filepath).iloc[:, [0,4,6]]
df.head()
X = np.asarray(df[['carat', 'depth', 'price']].head(500).values)
Y =np.asarray(df[['carat', 'depth', 'price']].values)
mahalanobis(X, Y)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.