简体   繁体   English

使用 numpy 的欧几里得距离

[英]Euclidean distance using numpy

I am trying to calculate the euclidean distance of two binary data (image) using numpy but I am getting nan in the result我正在尝试使用 numpy 计算两个二进制数据(图像)的欧几里德距离,但结果中我得到了nan

def eculideanDistance(features, predict, dist):
    dist += (float(features[0]) - float(predict[0]))
    return math.sqrt(dist)

Output Output

I am using this binary data我正在使用这个二进制数据

train_set = {
    0: [
        ["0000000000000111100000000000000000000000000011111110000000000000000000000011111111110000000000000000000111111111111110000000000000000001111111011111100000000000000000111111100000111100000000000000001111111000000011100000000000000011111110000000111100000000000000111111100000000111000000000000001111111000000001110000000000000011111100000000011110000000000000111111000000000011100000000000001111110000000000111000000000000001111110000000000111000000000000011111100000000001110000000000000111111000000000011100000000000001111110000000000111000000000000111111100000000011110000000000001111011000000000111100000000000011110000000000011110000000000000011110000000000011110000000000000111100000000001111100000000000001111000000000111110000000000000011110000000011111000000000000000011100000011111100000000000000000111100011111110000000000000000001111111111111100000000000000000001111111111111000000000000000000011111111111100000000000000000000011111111100000000000000000000000011111000000000000000000000000000011000000000000000000"],
        ["0000000000011111000000000000000000000000001111111000000000000000000000000111111111000000000000000000000011111111111000000000000000000001111111111111000000000000000000111111101111111000000000000000001111110001111111000000000000000011111100001111110000000000000001111111000001111110000000000000011111110000001111100000000000000011111100000001111110000000000001111111000000001111110000000000011111100000000001111100000000000111111000000000011111100000000001111110000000000111111000000000011111100000000000111110000000000111111000000000001111100000000001111110000000000011111000000000011111100000000000111110000000000111111000000000001111100000000000111110000000000011111000000000001111110000000001111110000000000011111100000000111111000000000000011111100000001111111000000000000011111000000111111100000000000000111110000011111110000000000000001111110001111111000000000000000011111111111111100000000000000000111111111111110000000000000000000111111111111000000000000000000000111111111100000000000000000000000111111110000000000000"]
    ],
    1: [
        ["0000000000000000111100000000000000000000000000011111111000000000000000000000000111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000111111111110000000000000000000011111111111110000000000000000111111111111111100000000000000011111111111111110000000000000001111111111111111100000000000000011111111111111111000000000000000001111111111111110000000000000000011111110111111100000000000000000011110001111111000000000000000000000000011111110000000000000000000000000111111000000000000000000000000011111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000001111111100000000000000000000000011111111000000000000000000000000111111110000000000000000000000000011111111000000000000000000000000111111100000000"],
        ["0000000001111100000000000000000000000000001111100000000000000000000000000011111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000011111111000000000000000000000000111111110000000000000000000000001111111000000000000000000000000001111111000000000000000000000000111111110000000000000000000000001111111110000000000000000000000011111111100000000000000000000001111111110000000000000000000000011111111110000000000000000000001111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000000111111111100000000000000000000000111111111000000000000000000000000001111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000001111111000000000000000000001111111111111111111000000000000111111111111111111111000000000001111111111111111111111000000000011111111111111111111110000000000001111111111111111111100000000000001111111111111111111"],
    ]
}

test_set = ["0000000000000000011000000000000000000000000000011111111000000000000000000000011111111111000000000000000000000011111111111000000000000000000001111111111110000000000000000000011111111111100000000000000000011111111111110000000000000000000111111111111100000000000000000001111111111111000000000000000000111111111111110000000000000000111111111111111100000000000000001111111111111111000000000000000001111111111111111000000000000001111111111111111110000000000000111111111111111111100000000000001111111111111111111000000000000001111111111111111110000000000000000010000111111111100000000000000000000001111111110000000000000000000000011111111100000000000000000000000111111111000000000000000000000000111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000011111111111100000000000000000000000111111111100000000000000000000000111111000000000"]

The formula you use for Euclidean distance is not correct.您用于欧几里得距离的公式不正确。 You will end up computing square root of negative numbers and this is why you get NaN .您最终将计算负数的平方根,这就是您得到NaN的原因。 I think you meant doing something like:我认为你的意思是做类似的事情:

def euclideanDistance(features, predict, dist):
    diff = (float(features[0]) - float(predict[0]))
    dist += diff * diff 
    return math.sqrt(dist)

(I'm not sure why you always use index 0 and why the dist variable is a parameter and not only a return value. I suspect there might be a problem with this also but I lack context to judge.) (我不确定你为什么总是使用索引0以及为什么dist变量是一个参数而不仅仅是一个返回值。我怀疑这也可能有问题,但我缺乏判断的上下文。)

However, if you instead encode your images as Numpy arrays instead of strings, Numpy offers a direct way to compute Euclidean norm if you encode:但是,如果您将图像编码为 Numpy arrays 而不是字符串,则 Numpy 提供了一种直接计算欧几里得范数的方法:

a = numpy.array([0,0,1,1])
b = numpy.array([1,0,0,1])
euclidean_norm = numpy.linalg.norm(a-b)

This is not binary data.这不是二进制数据。 This is a binary image stored as a string, where the pixels are either represented by 0 (black) or 1 (white).这是存储为字符串的二进制图像,其中像素由0 (黑色)或1 (白色)表示。

To make things easier, lets convert your data to 32 x 32 numpy array and visualize it.为了让事情变得更简单,让我们将您的数据转换为 32 x 32 numpy array并将其可视化。

Converting train_set to numpy arraytrain_set转换为numpy array

train_img = {label: [np.uint8([*sample[0]]).reshape(32, 32) 
    for sample in samples] 
        for label, samples in train_set.items()}

在此处输入图像描述

Converting test_set to numpy arraytest_set转换为numpy array

test_img = np.uint8([*test_set[0]]).reshape(32, 32)

在此处输入图像描述

From this point, calculating the euclidean distance using numpy is pretty straightforward using numpy.linalg.norm .从这一点来看,使用numpy计算欧几里得距离非常简单,使用numpy.linalg.norm eg:例如:

In [5]: np.linalg.norm(test_img - train_img[0][0])

Out[5]: 2984.7336564591487


In [6]: np.linalg.norm(test_img - train_img[0][1])

Out[6]: 3459.016189612301


In [7]: np.linalg.norm(test_img - train_img[1][0])

Out[7]: 1691.5064291926294


In [8]: np.linalg.norm(test_img - train_img[1][1])

Out[8]: 2650.0669802855928

Full code for this answer此答案的完整代码

In [1]: import numpy as np


In [2]: train_set = {

   ...:     0: [

   ...:         ["0000000000000111100000000000000000000000000011111110000000000000000000000011111111110000000000000000000111111111111110000000000000000001111111011111100000000000000000111111100000111100000000000000001111111000000011100000000000000011111110000000111100000000000000111111100000000111000000000000001111111000000001110000000000000011111100000000011110000000000000111111000000000011100000000000001111110000000000111000000000000001111110000000000111000000000000011111100000000001110000000000000111111000000000011100000000000001111110000000000111000000000000111111100000000011110000000000001111011000000000111100000000000011110000000000011110000000000000011110000000000011110000000000000111100000000001111100000000000001111000000000111110000000000000011110000000011111000000000000000011100000011111100000000000000000111100011111110000000000000000001111111111111100000000000000000001111111111111000000000000000000011111111111100000000000000000000011111111100000000000000000000000011111000000000000000000000000000011000000000000000000"],

   ...:         ["0000000000011111000000000000000000000000001111111000000000000000000000000111111111000000000000000000000011111111111000000000000000000001111111111111000000000000000000111111101111111000000000000000001111110001111111000000000000000011111100001111110000000000000001111111000001111110000000000000011111110000001111100000000000000011111100000001111110000000000001111111000000001111110000000000011111100000000001111100000000000111111000000000011111100000000001111110000000000111111000000000011111100000000000111110000000000111111000000000001111100000000001111110000000000011111000000000011111100000000000111110000000000111111000000000001111100000000000111110000000000011111000000000001111110000000001111110000000000011111100000000111111000000000000011111100000001111111000000000000011111000000111111100000000000000111110000011111110000000000000001111110001111111000000000000000011111111111111100000000000000000111111111111110000000000000000000111111111111000000000000000000000111111111100000000000000000000000111111110000000000000"]

   ...:     ],

   ...:     1: [

   ...:         ["0000000000000000111100000000000000000000000000011111111000000000000000000000000111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000111111111110000000000000000000011111111111110000000000000000111111111111111100000000000000011111111111111110000000000000001111111111111111100000000000000011111111111111111000000000000000001111111111111110000000000000000011111110111111100000000000000000011110001111111000000000000000000000000011111110000000000000000000000000111111000000000000000000000000011111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000001111111100000000000000000000000011111111000000000000000000000000111111110000000000000000000000000011111111000000000000000000000000111111100000000"],

   ...:         ["0000000001111100000000000000000000000000001111100000000000000000000000000011111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000011111111000000000000000000000000111111110000000000000000000000001111111000000000000000000000000001111111000000000000000000000000111111110000000000000000000000001111111110000000000000000000000011111111100000000000000000000001111111110000000000000000000000011111111110000000000000000000001111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000000111111111100000000000000000000000111111111000000000000000000000000001111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000001111111000000000000000000001111111111111111111000000000000111111111111111111111000000000001111111111111111111111000000000011111111111111111111110000000000001111111111111111111100000000000001111111111111111111"],

   ...:     ]

   ...: }

   ...: 

   ...: test_set = ["0000000000000000011000000000000000000000000000011111111000000000000000000000011111111111000000000000000000000011111111111000000000000000000001111111111110000000000000000000011111111111100000000000000000011111111111110000000000000000000111111111111100000000000000000001111111111111000000000000000000111111111111110000000000000000111111111111111100000000000000001111111111111111000000000000000001111111111111111000000000000001111111111111111110000000000000111111111111111111100000000000001111111111111111111000000000000001111111111111111110000000000000000010000111111111100000000000000000000001111111110000000000000000000000011111111100000000000000000000000111111111000000000000000000000000111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000011111111111100000000000000000000000111111111100000000000000000000000111111000000000"]

   ...: 


In [3]: train_img = {label: [np.uint8([*sample[0]]).reshape(32, 32) 

   ...:     for sample in samples] 

   ...:         for label, samples in train_set.items()}


In [4]: test_img = np.uint8([*test_set[0]]).reshape(32, 32)


In [5]: np.linalg.norm(test_img - train_img[0][0])

Out[5]: 2984.7336564591487


In [6]: np.linalg.norm(test_img - train_img[0][1])

Out[6]: 3459.016189612301


In [7]: np.linalg.norm(test_img - train_img[1][0])

Out[7]: 1691.5064291926294


In [8]: np.linalg.norm(test_img - train_img[1][1])

Out[8]: 2650.0669802855928

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM