简体   繁体   English

在Python中计算PCA的欧几里得距离

[英]calculate euclidean distance for PCA in python

I have PCA with 3D numpy array as 我有3D numpy array作为PCA

pcar =[[xa ya za]
       [xb yb zb]
       [xc yc zc]
       .
       .
       [xn yn zn]]

where each row is a point and I have selected any two random rows from above PCA as a cluster as 其中每一行都是一个点,我从PCA上方选择了任意两个随机行作为群集

out_list=pcar[numpy.random.randint(0,pcar.shape[0],2)]

which gives numpy array with 2 rows. 这给出了具有2行的numpy数组。

I have to find euclidean distance from each row of out_list with each row(point) in pcar and add that pcar point to nearest point in out_list cluster. 我必须找到距out_list的每一行与pcar中的每一行(点)的欧几里得距离,并将该pcar点添加到out_list集群中的最近点。

Edit Ok, I downloaded, installed and taught myself numpy. 编辑好,我下载,安装并自学了numpy。 Here is a numpy version 这是一个numpy版本

Old answer 旧答案

I realise you want a numpy answer. 我知道您想要一个麻木的答案。 My numpy is rusty, but since there are no other answers, I thought I'd give you one in Matlab. 我的numpy生锈了,但是由于没有其他答案,我想在Matlab中给你一个答案。 It should be straightforward to convert. 转换应该很简单。 I'm assuming the issue is the concept, not the code. 我假设问题是概念,而不是代码。

Note there are many ways to skin this cat, I'm just giving one. 请注意,有很多方法可以给这只猫蒙皮,我只是举一种。

Working Numpy version 工作脾气暴躁版本

import numpy as np

pcar = np.random.rand(10,3)

out_list=pcar[np.random.randint(0,pcar.shape[0],2)]

ol_1 = out_list[0,:]
ol_2 = out_list[1,:]

## Get the individual distances
## The trick here is to pre-multiply the 1x3 ol vector with a row of
## ones of size 10x1 to get a 10x3 array with ol replicated, so that it
## can simply be subtracted
d1 = pcar - ones( size(pcar,1))*ol_1
d2 = pcar - ones( size(pcar,1))*ol_2

##% Square them using an element-wise square
d1s = np.square(d1)
d2s = np.square(d2)

##% Sum across the rows, not down columns
d1ss = np.sum(d1s, axis=1)
d2ss = np.sum(d2s, axis=1)

##% Square root using an element-wise square-root
e1 = np.sqrt(d1ss)
e2 = np.sqrt(d2ss)

##% Assign to class one or class two
##% Start by assigning one to everything, then select all those where ol_2
##% is closer and assign them the number 2
assign = ones(size(e1,0));
assign[e2<e1] = 2

##% Separate
pcar1 = pcar[ assign==1, :]
pcar2 = pcar[ assign==2, :]

Working Matlab version 工作Matlab版本

close all
clear all

% Create 10 records each with 3 attributes
pcar = rand(10, 3)

% Pick two (normally at random of course)
out_list = pcar(1:2, :)

% Hard-coding this separately, though this can be done iteratively
ol_1 = out_list(1,:)
ol_2 = out_list(2,:)

% Get the individual distances
% The trick here is to pre-multiply the 1x3 ol vector with a row of
% ones of size 10x1 to get a 10x3 array with ol replicated, so that it
% can simply be subtracted
d1 = pcar - ones( size(pcar,1), 1)*ol_1
d2 = pcar - ones( size(pcar,1), 1)*ol_2

% Square them using an element-wise square
d1s = d1.^2
d2s = d2.^2

% Sum across the rows, not down columns
d1ss = sum(d1s, 2)
d2ss = sum(d2s, 2)

% Square root using an element-wise square-root
e1 = sqrt(d1ss)
e2 = sqrt(d2ss)

% Assign to class one or class two
% Start by assigning one to everything, then select all those where ol_2
% is closer and assign them the number 2
assign = ones(length(e1),1);
assign(e2<e1)=2

% Separate
pcar1 = pcar( assign==1, :)
pcar2 = pcar( assign==2, :)

% Plot
plot3(pcar1(:,1), pcar1(:,2), pcar1(:,3), 'g+')
hold on
plot3(pcar2(:,1), pcar2(:,2), pcar2(:,3), 'r+')
plot3(ol_1(1), ol_1(2), ol_1(3), 'go')
plot3(ol_2(1), ol_2(2), ol_2(3), 'ro')

There is a really fast implementation in Scipy : Scipy有一个非常快速的实现:

 from scipy.spatial.distance import cdist, pdist

cdist takes two vectors like your pchar one and calculates the distances betweeen each of these points. cdist像pchar一样采用两个向量,并计算每个点之间的距离。 pdist will give you only the upper triangle of that matrix. pdist将只给您该矩阵的上三角。

As they are implemented in C or Fortran behind the scenes, they are very performant. 由于它们是在后台用C或Fortran实现的,因此它们的性能很高。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM