[英]Reshape your data either using array.reshape(-1, 1) if your data has a single feature
How can I use metrics.silouhette_score on a dataset which has 1300 images that I have their ResNet50 feature vectors (each of length 2048) and a discrete class label between 1 to 9? 如何在具有1300张图像的数据集上使用metrics.silouhette_score,这些图像具有ResNet50特征向量(每个长度为2048)和离散类标签(介于1到9之间)?
import pandas as pd
import numpy as np
from sklearn.metrics import pairwise_distances
from sklearn import cluster, datasets, preprocessing, metrics
from sklearn.cluster import KMeans
df = pd.read_csv("master.csv")
labels = list(df['Q3 Theme1'])
labels_reshaped = np.ndarray(labels).reshape(-1,1)
X = open('entire_dataset__resnet50_feature_vectors.txt')
X_Data = X.read()
print('Silhouette Score:', metrics.silhouette_score(X_Data, labels_reshaped,
metric='cosine'))
I get this error: 我收到此错误:
Traceback (most recent call last):
File "/dataset/silouhette_score.py", line 8, in <module>
labels_reshaped = np.ndarray(labels).reshape(-1,1)
ValueError: sequence too large; cannot be greater than 32
Process finished with exit code 1
For this other code: 对于此其他代码:
import pandas as pd
import numpy as np
from sklearn.metrics import pairwise_distances
from sklearn import cluster, datasets, preprocessing, metrics
from sklearn.cluster import KMeans
df = pd.read_csv("master.csv")
labels = list(df['Q3 Theme1'])
labels_reshaped = np.ndarray(labels).reshape(1,-1)
X = open('entire_dataset__resnet50_feature_vectors.txt')
X_Data = X.read()
print('Silhouette Score:', metrics.silhouette_score(X_Data, labels_reshaped,
metric='cosine'))
I get this error: 我收到此错误:
Traceback (most recent call last):
File "/dataset/silouhette_score.py", line 8, in <module>
labels_reshaped = np.ndarray(labels).reshape(1,-1)
ValueError: sequence too large; cannot be greater than 32
Process finished with exit code 1
If I run this other code: 如果我运行此其他代码:
import pandas as pd
from sklearn import metrics
df = pd.read_csv("master.csv")
labels = list(df['Q3 Theme1'])
X = open('entire_dataset__resnet50_feature_vectors.txt')
X_Data = X.read()
print('Silhouette Score:', metrics.silhouette_score(X_Data, labels,
metric='cosine'))
I get this as an output: https://pastebin.com/raw/hk2axdWL 我将其作为输出: https : //pastebin.com/raw/hk2axdWL
How can I fix this code so that I can print the single silhouette score? 如何修复此代码,以便可以打印单个轮廓分数?
Reshape your data either using array.reshape(-1, 1) if your data has a single feature or array.reshape(1, -1) if it contains a single sample.
Process finished with exit code 1
I have pasted one line of my feature vector file (a .txt file) here: https://pastebin.com/raw/hk2axdWL (consists of 2048 numbers separated by space) 我在此处粘贴了特征向量文件(.txt文件)的一行: https : //pastebin.com/raw/hk2axdWL (由2048个数字组成,以空格分隔)
I was eventually able to figure this out. 我最终能够弄清楚这一点。 I needed to create the feature vector same exact format as sklearn required them:
我需要创建与sklearn相同的精确格式的特征向量:
import pandas as pd
from sklearn import metrics
df = pd.read_csv("master.csv")
labels = list(df['Q3 Theme1'])
X = open('entire_dataset__resnet50_feature_vectors.txt')
#X_Data = X.read()
fv = []
for line in X:
line = line.strip("\n")
tmp_arr = line.split(' ')
print(tmp_arr)
fv.append(tmp_arr)
print(fv)
print('Silhouette Score:', metrics.silhouette_score(fv, labels,
metric='cosine'))
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.