如果数据具有单个功能，则可以使用array.reshape（-1，1）重塑数据

Question

How can I use metrics.silouhette_score on a dataset which has 1300 images that I have their ResNet50 feature vectors (each of length 2048) and a discrete class label between 1 to 9? 如何在具有1300张图像的数据集上使用metrics.silouhette_score，这些图像具有ResNet50特征向量（每个长度为2048）和离散类标签（介于1到9之间）？

import pandas as pd
import numpy as np
from sklearn.metrics import pairwise_distances
from sklearn import cluster, datasets, preprocessing, metrics
from sklearn.cluster import KMeans
df = pd.read_csv("master.csv")
labels = list(df['Q3 Theme1'])
labels_reshaped = np.ndarray(labels).reshape(-1,1)
X = open('entire_dataset__resnet50_feature_vectors.txt')
X_Data = X.read()
print('Silhouette Score:', metrics.silhouette_score(X_Data, labels_reshaped,
                                                    metric='cosine'))

I get this error: 我收到此错误：

Traceback (most recent call last):
  File "/dataset/silouhette_score.py", line 8, in <module>
    labels_reshaped = np.ndarray(labels).reshape(-1,1)
ValueError: sequence too large; cannot be greater than 32

Process finished with exit code 1

For this other code: 对于此其他代码：

import pandas as pd
import numpy as np
from sklearn.metrics import pairwise_distances
from sklearn import cluster, datasets, preprocessing, metrics
from sklearn.cluster import KMeans
df = pd.read_csv("master.csv")
labels = list(df['Q3 Theme1'])
labels_reshaped = np.ndarray(labels).reshape(1,-1)
X = open('entire_dataset__resnet50_feature_vectors.txt')
X_Data = X.read()
print('Silhouette Score:', metrics.silhouette_score(X_Data, labels_reshaped,
                                                    metric='cosine'))

I get this error: 我收到此错误：

Traceback (most recent call last):
  File "/dataset/silouhette_score.py", line 8, in <module>
    labels_reshaped = np.ndarray(labels).reshape(1,-1)
ValueError: sequence too large; cannot be greater than 32

Process finished with exit code 1

If I run this other code: 如果我运行此其他代码：

import pandas as pd
from sklearn import metrics
df = pd.read_csv("master.csv")
labels = list(df['Q3 Theme1'])
X = open('entire_dataset__resnet50_feature_vectors.txt')
X_Data = X.read()
print('Silhouette Score:', metrics.silhouette_score(X_Data, labels,
                                                    metric='cosine'))

I get this as an output: https://pastebin.com/raw/hk2axdWL 我将其作为输出： https : //pastebin.com/raw/hk2axdWL

How can I fix this code so that I can print the single silhouette score? 如何修复此代码，以便可以打印单个轮廓分数？

Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.

Process finished with exit code 1

I have pasted one line of my feature vector file (a .txt file) here: https://pastebin.com/raw/hk2axdWL (consists of 2048 numbers separated by space) 我在此处粘贴了特征向量文件（.txt文件）的一行： https : //pastebin.com/raw/hk2axdWL （由2048个数字组成，以空格分隔）

Answer 1

I was eventually able to figure this out. 我最终能够弄清楚这一点。 I needed to create the feature vector same exact format as sklearn required them: 我需要创建与sklearn相同的精确格式的特征向量：

import pandas as pd
from sklearn import metrics


df = pd.read_csv("master.csv")
labels = list(df['Q3 Theme1'])
X = open('entire_dataset__resnet50_feature_vectors.txt')
#X_Data = X.read()

fv = []
for line in X:
    line = line.strip("\n")
    tmp_arr = line.split(' ')
    print(tmp_arr)
    fv.append(tmp_arr)

print(fv)
print('Silhouette Score:', metrics.silhouette_score(fv, labels,
                                                    metric='cosine'))

如果数据具有单个功能，则可以使用array.reshape（-1，1）重塑数据

问题描述

1 个解决方案

解决方案1
-1 已采纳 2019-05-13 21:57:26

如果数据具有单个功能，则可以使用array.reshape（-1，1）重塑数据

问题描述

1 个解决方案

解决方案1 -1 已采纳 2019-05-13 21:57:26

解决方案1
-1 已采纳 2019-05-13 21:57:26