[英]Euclidean distance using numpy
I am trying to calculate the euclidean distance of two binary data (image) using numpy but I am getting nan
in the result我正在尝试使用 numpy 计算两个二进制数据(图像)的欧几里德距离,但结果中我得到了nan
def eculideanDistance(features, predict, dist):
dist += (float(features[0]) - float(predict[0]))
return math.sqrt(dist)
I am using this binary data我正在使用这个二进制数据
train_set = {
0: [
["0000000000000111100000000000000000000000000011111110000000000000000000000011111111110000000000000000000111111111111110000000000000000001111111011111100000000000000000111111100000111100000000000000001111111000000011100000000000000011111110000000111100000000000000111111100000000111000000000000001111111000000001110000000000000011111100000000011110000000000000111111000000000011100000000000001111110000000000111000000000000001111110000000000111000000000000011111100000000001110000000000000111111000000000011100000000000001111110000000000111000000000000111111100000000011110000000000001111011000000000111100000000000011110000000000011110000000000000011110000000000011110000000000000111100000000001111100000000000001111000000000111110000000000000011110000000011111000000000000000011100000011111100000000000000000111100011111110000000000000000001111111111111100000000000000000001111111111111000000000000000000011111111111100000000000000000000011111111100000000000000000000000011111000000000000000000000000000011000000000000000000"],
["0000000000011111000000000000000000000000001111111000000000000000000000000111111111000000000000000000000011111111111000000000000000000001111111111111000000000000000000111111101111111000000000000000001111110001111111000000000000000011111100001111110000000000000001111111000001111110000000000000011111110000001111100000000000000011111100000001111110000000000001111111000000001111110000000000011111100000000001111100000000000111111000000000011111100000000001111110000000000111111000000000011111100000000000111110000000000111111000000000001111100000000001111110000000000011111000000000011111100000000000111110000000000111111000000000001111100000000000111110000000000011111000000000001111110000000001111110000000000011111100000000111111000000000000011111100000001111111000000000000011111000000111111100000000000000111110000011111110000000000000001111110001111111000000000000000011111111111111100000000000000000111111111111110000000000000000000111111111111000000000000000000000111111111100000000000000000000000111111110000000000000"]
],
1: [
["0000000000000000111100000000000000000000000000011111111000000000000000000000000111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000111111111110000000000000000000011111111111110000000000000000111111111111111100000000000000011111111111111110000000000000001111111111111111100000000000000011111111111111111000000000000000001111111111111110000000000000000011111110111111100000000000000000011110001111111000000000000000000000000011111110000000000000000000000000111111000000000000000000000000011111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000001111111100000000000000000000000011111111000000000000000000000000111111110000000000000000000000000011111111000000000000000000000000111111100000000"],
["0000000001111100000000000000000000000000001111100000000000000000000000000011111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000011111111000000000000000000000000111111110000000000000000000000001111111000000000000000000000000001111111000000000000000000000000111111110000000000000000000000001111111110000000000000000000000011111111100000000000000000000001111111110000000000000000000000011111111110000000000000000000001111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000000111111111100000000000000000000000111111111000000000000000000000000001111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000001111111000000000000000000001111111111111111111000000000000111111111111111111111000000000001111111111111111111111000000000011111111111111111111110000000000001111111111111111111100000000000001111111111111111111"],
]
}
test_set = ["0000000000000000011000000000000000000000000000011111111000000000000000000000011111111111000000000000000000000011111111111000000000000000000001111111111110000000000000000000011111111111100000000000000000011111111111110000000000000000000111111111111100000000000000000001111111111111000000000000000000111111111111110000000000000000111111111111111100000000000000001111111111111111000000000000000001111111111111111000000000000001111111111111111110000000000000111111111111111111100000000000001111111111111111111000000000000001111111111111111110000000000000000010000111111111100000000000000000000001111111110000000000000000000000011111111100000000000000000000000111111111000000000000000000000000111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000011111111111100000000000000000000000111111111100000000000000000000000111111000000000"]
The formula you use for Euclidean distance is not correct.您用于欧几里得距离的公式不正确。 You will end up computing square root of negative numbers and this is why you get NaN
.您最终将计算负数的平方根,这就是您得到NaN
的原因。 I think you meant doing something like:我认为你的意思是做类似的事情:
def euclideanDistance(features, predict, dist):
diff = (float(features[0]) - float(predict[0]))
dist += diff * diff
return math.sqrt(dist)
(I'm not sure why you always use index 0
and why the dist
variable is a parameter and not only a return value. I suspect there might be a problem with this also but I lack context to judge.) (我不确定你为什么总是使用索引0
以及为什么dist
变量是一个参数而不仅仅是一个返回值。我怀疑这也可能有问题,但我缺乏判断的上下文。)
However, if you instead encode your images as Numpy arrays instead of strings, Numpy offers a direct way to compute Euclidean norm if you encode:但是,如果您将图像编码为 Numpy arrays 而不是字符串,则 Numpy 提供了一种直接计算欧几里得范数的方法:
a = numpy.array([0,0,1,1])
b = numpy.array([1,0,0,1])
euclidean_norm = numpy.linalg.norm(a-b)
This is not binary data.这不是二进制数据。 This is a binary image stored as a string, where the pixels are either represented by 0
(black) or 1
(white).这是存储为字符串的二进制图像,其中像素由0
(黑色)或1
(白色)表示。
To make things easier, lets convert your data to 32 x 32 numpy array
and visualize it.为了让事情变得更简单,让我们将您的数据转换为 32 x 32 numpy array
并将其可视化。
train_set
to numpy array
将train_set
转换为numpy array
train_img = {label: [np.uint8([*sample[0]]).reshape(32, 32)
for sample in samples]
for label, samples in train_set.items()}
test_set
to numpy array
将test_set
转换为numpy array
test_img = np.uint8([*test_set[0]]).reshape(32, 32)
From this point, calculating the euclidean distance using numpy
is pretty straightforward using numpy.linalg.norm
.从这一点来看,使用numpy
计算欧几里得距离非常简单,使用numpy.linalg.norm
。 eg:例如:
In [5]: np.linalg.norm(test_img - train_img[0][0])
Out[5]: 2984.7336564591487
In [6]: np.linalg.norm(test_img - train_img[0][1])
Out[6]: 3459.016189612301
In [7]: np.linalg.norm(test_img - train_img[1][0])
Out[7]: 1691.5064291926294
In [8]: np.linalg.norm(test_img - train_img[1][1])
Out[8]: 2650.0669802855928
In [1]: import numpy as np
In [2]: train_set = {
...: 0: [
...: ["0000000000000111100000000000000000000000000011111110000000000000000000000011111111110000000000000000000111111111111110000000000000000001111111011111100000000000000000111111100000111100000000000000001111111000000011100000000000000011111110000000111100000000000000111111100000000111000000000000001111111000000001110000000000000011111100000000011110000000000000111111000000000011100000000000001111110000000000111000000000000001111110000000000111000000000000011111100000000001110000000000000111111000000000011100000000000001111110000000000111000000000000111111100000000011110000000000001111011000000000111100000000000011110000000000011110000000000000011110000000000011110000000000000111100000000001111100000000000001111000000000111110000000000000011110000000011111000000000000000011100000011111100000000000000000111100011111110000000000000000001111111111111100000000000000000001111111111111000000000000000000011111111111100000000000000000000011111111100000000000000000000000011111000000000000000000000000000011000000000000000000"],
...: ["0000000000011111000000000000000000000000001111111000000000000000000000000111111111000000000000000000000011111111111000000000000000000001111111111111000000000000000000111111101111111000000000000000001111110001111111000000000000000011111100001111110000000000000001111111000001111110000000000000011111110000001111100000000000000011111100000001111110000000000001111111000000001111110000000000011111100000000001111100000000000111111000000000011111100000000001111110000000000111111000000000011111100000000000111110000000000111111000000000001111100000000001111110000000000011111000000000011111100000000000111110000000000111111000000000001111100000000000111110000000000011111000000000001111110000000001111110000000000011111100000000111111000000000000011111100000001111111000000000000011111000000111111100000000000000111110000011111110000000000000001111110001111111000000000000000011111111111111100000000000000000111111111111110000000000000000000111111111111000000000000000000000111111111100000000000000000000000111111110000000000000"]
...: ],
...: 1: [
...: ["0000000000000000111100000000000000000000000000011111111000000000000000000000000111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111000000000000000000000011111111110000000000000000000001111111111100000000000000000000111111111110000000000000000000011111111111110000000000000000111111111111111100000000000000011111111111111110000000000000001111111111111111100000000000000011111111111111111000000000000000001111111111111110000000000000000011111110111111100000000000000000011110001111111000000000000000000000000011111110000000000000000000000000111111000000000000000000000000011111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000001111111100000000000000000000000011111111000000000000000000000000111111110000000000000000000000000011111111000000000000000000000000111111100000000"],
...: ["0000000001111100000000000000000000000000001111100000000000000000000000000011111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000011111111000000000000000000000000111111110000000000000000000000001111111000000000000000000000000001111111000000000000000000000000111111110000000000000000000000001111111110000000000000000000000011111111100000000000000000000001111111110000000000000000000000011111111110000000000000000000001111111111000000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000000111111111100000000000000000000000111111111000000000000000000000000001111111000000000000000000000000011111110000000000000000000000000111111100000000000000000000000000111111100000000000000000000000001111111000000000000000000000000001111111000000000000000000001111111111111111111000000000000111111111111111111111000000000001111111111111111111111000000000011111111111111111111110000000000001111111111111111111100000000000001111111111111111111"],
...: ]
...: }
...:
...: test_set = ["0000000000000000011000000000000000000000000000011111111000000000000000000000011111111111000000000000000000000011111111111000000000000000000001111111111110000000000000000000011111111111100000000000000000011111111111110000000000000000000111111111111100000000000000000001111111111111000000000000000000111111111111110000000000000000111111111111111100000000000000001111111111111111000000000000000001111111111111111000000000000001111111111111111110000000000000111111111111111111100000000000001111111111111111111000000000000001111111111111111110000000000000000010000111111111100000000000000000000001111111110000000000000000000000011111111100000000000000000000000111111111000000000000000000000000111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000000111111111100000000000000000000001111111111000000000000000000000011111111110000000000000000000011111111111100000000000000000000000111111111100000000000000000000000111111000000000"]
...:
In [3]: train_img = {label: [np.uint8([*sample[0]]).reshape(32, 32)
...: for sample in samples]
...: for label, samples in train_set.items()}
In [4]: test_img = np.uint8([*test_set[0]]).reshape(32, 32)
In [5]: np.linalg.norm(test_img - train_img[0][0])
Out[5]: 2984.7336564591487
In [6]: np.linalg.norm(test_img - train_img[0][1])
Out[6]: 3459.016189612301
In [7]: np.linalg.norm(test_img - train_img[1][0])
Out[7]: 1691.5064291926294
In [8]: np.linalg.norm(test_img - train_img[1][1])
Out[8]: 2650.0669802855928
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.