简体   繁体   English

两个数组或不同长度的向量之间的距离?

[英]Distance between two arrays or vectors of different length?

I have a program to predict a positive or negative review using the kNN algorithm. 我有一个程序可以预测使用kNN算法的正面或负面评论。 After doing Bag of Words on my training set of reviews I wish to find the distance between the vectors/arrays. 在我的训练评论集上完成“单词袋”之后,我希望找到向量/数组之间的距离。 However I can not use euclidean_distances() because the vectors are all varying distances. 但是我不能使用euclidean_distances(),因为矢量都是变化的距离。 How can I find the distance between vectors of different lengths? 如何找到不同长度向量之间的距离?

from sklearn.metrics.pairwise import euclidean_distances
from sklearn.metrics.pairwise import pairwise_distances
import numpy as np
import math
import re
import random

def divide_chunks(l, n):

    for i in range(0, len(l),n):
        yield l[i:i + n]

with open(r"D:\Desktop\\1565964985_2925534_train_file.data", "r") as f:
    data_lines = f.readlines()

sentiments = list()
reviews = list()

for i, line in enumerate(data_lines):
    s = ''.join(re.findall("^[+1]*[-1]*[0]*", line))
    r = line.replace(s, '').strip()
    #print('line:{} \n\t sentiment: {} \n\t review: {}'.format(i, s, r))
    sentiments.append(s)
    reviews.append(r)

n = 1
x = list(divide_chunks(reviews, n))
print(x[0])

count = CountVectorizer()
docs = np.array(x[0])
bag = count.fit_transform(docs)
print(bag.toarray())

docs = np.array(x[1])
bag1 = count.fit_transform(docs)
print(bag1.toarray())

euclidean_distances(bag, bag1)

f.close()

Error and traceback: 错误和回溯:

[[1 1 1 1 2 1 2 2 1 2 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 2 2 1 1 1 1 2 1 3 1
  1 2 1 1 1 1 1 1 1 1 1 1 1 1 2 6 1 1 2 1 1 1 1 1 1 5 1 1 1 1 2 2 1 1 1 1
  2 1]]
[[1 1 3 3 1 1 1 1 1 1 2 1 1 2 2 1 1 1 1 1 1 2 1 1 6 1 1 1 2 1 6 1 1 1 4 3
  1 1 1 1 1 3 4 1 1 1 1 1 1 1 1 4 1 1 3 1 1 1 1 3 1 2 1 1 1 1 1 2 1 1 1 2
  1 2 1 1 1 5 1 1 2 1 1 1 8 1 4 1 1 1 1 3 2 1 3 1 1 2 1 1]]
Traceback (most recent call last):
  File "D:/Users/Not_J/PycharmProjects/untitled/HW1_Durand.py", line 82, in <module>
    euclidean_distances(bag, bag1)
  File "D:\Users\Not_J\PycharmProjects\HW1_Durand\venv\lib\site-packages\sklearn\metrics\pairwise.py", line 232, in euclidean_distances
    X, Y = check_pairwise_arrays(X, Y)
  File "D:\Users\Not_J\PycharmProjects\HW1_Durand\venv\lib\site-packages\sklearn\metrics\pairwise.py", line 125, in check_pairwise_arrays
    X.shape[1], Y.shape[1]))
ValueError: Incompatible dimension for X and Y matrices: X.shape[1] == 74 while Y.shape[1] == 100 ```

Try Changing the Second fit_transform(docs) to transform(docs). 尝试将第二个fit_transform(docs)更改为transform(docs)。

Check out this issue for more info. 查看问题以获取更多信息。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM