简体   繁体   English

Python训练Kmeans算法预测图像的主色

[英]Python training Kmeans algorithm to predict the dominant color of a image

I am trying to create a model that will predict the dominant color in the image using K-means clustering.我正在尝试创建一个 model ,它将使用 K-means 聚类预测图像中的主色。 I have the data all set up, but I am unsure how I can proceed after fitting the model.我已经设置了所有数据,但我不确定在安装 model 后如何继续。 Thanks谢谢

 from sklearn.cluster import KMeans
 import h5py


 train_data = h5py.File('x_train.h5','r')
 test_data = h5py.File('x_test.h5','r')

 x_train = train_data['train'][:]
 x_test = test_data['test'][:]

 print(x_train.shape) # (429-number of images, 416-height,416-width, 3-channels)

 x_train = x_train/255.0
 x_test = x_test/255.0

 X_train = x_train.reshape(len(x_train),-1)
 X_test = x_test.reshape(len(x_test),-1)

 kmeans = KMeans(n_clusters = 5)
 # Fitting the model to training set
 kmeans.fit(X_train)

 #------edit------

 pred = kmeans.predict(X_test[0])

 labels=pred.labels_
 labels=list(labels)

 centroid=pred.cluster_centers_

 percent=[]
 for i in range(len(centroid)):
     x=labels.count(i)
     x=x/(len(labels))
     percent.append(x)

 get_label_index = percent.index(max(percent))

 get_rgb_of_dominant_color = centroid[get_label_index][:]

 print(get_rgb_of_dominant_color)

This is one approach that I can think of.这是我能想到的一种方法。 Suppose, you are fixing the clusters as "5" as done in your code.假设您将集群固定为“5”,就像您的代码中所做的那样。

Identify the 5 cluster centroids using: kmeans.cluster_centers_使用以下方法识别 5 个集群质心: kmeans.cluster_centers_

Rank the cluster centroids in the order of 1 to 5 based on the number of datapoints associated with each of them.根据与每个聚类质心关联的数据点的数量,按 1 到 5 的顺序对聚类质心进行排名。 The cluster centroid with the highest number of datapoints associated to it would be the dominant one.与其关联的数据点数量最多的集群质心将是主要的。 Use the cluster centroid's RBG value and visualize to see the color.使用集群质心的 RBG 值并可视化以查看颜色。


EDITED - Added code for detailed explanation已编辑 - 添加了详细说明的代码


Below is the code where I have loaded an image and then trying to find the most dominant color.下面是我加载图像然后试图找到最主要的颜色的代码。

import numpy as np
import pandas as pd
from PIL import Image
import matplotlib.pyplot as plt
from matplotlib.pyplot import imshow
from sklearn.cluster import KMeans
%matplotlib inline

#Load the image
arr_img = np.array(Image.open("beach.bmp"), dtype='int32')
plt.imshow(arr_img)

原始图像

#reshape array from 3D to 2D
r, c, l = arr_img.shape
reshape_img = np.reshape(arr_img, (r*c, l), order="C")
#fit the model with 5 clusters
kmeans = KMeans(n_clusters = 5 ,max_iter=1000, init='random')
kmeans.fit(reshape_img)
# Looking at the labels and their associated data points
unique, counts = np.unique(kmeans.labels_, return_counts=True)
print("The labels are: ",unique)
print("Count of  items: ",counts)
# Find the most dense cluster label
idx = np.where(counts == counts.max())[0]
# Pick the mose dense cluster centroid
s = tuple(map(int,kmeans.cluster_centers_[idx][0]))
# Visualize the color
plt.imshow([[s]])

主色

You can see that Kmeans has identified Blue as the most dominant color correctly.您可以看到 Kmeans 正确地将蓝色识别为最主要的颜色。

I was taught that you'd first check the number of kmeans you'd need for looking when the inertia improvement drops below 20%, which could be done by:我被告知,当惯性改进下降到 20% 以下时,您首先检查您需要查看的 kmeans 数量,这可以通过以下方式完成:

test = []
K = range(1,10)

for k in K:
    model = KMeans(n_clusters=k)
    model.fit(X)
    test.append(model.inertia_)

for index, x in enumerate(test):
    if index == 0:
        continue
    else:
        print(index, (((test[index - 1] - x) / test[index - 1]) * 100))

But I guess for you it would be the number of different colors.但我猜对你来说,这将是不同 colors 的数量。 I would advice you to use that number then, which I assume is 5 looking at your code.我会建议你使用那个数字,我假设是 5 查看你的代码。

After that, to predict the color, you would do之后,要预测颜色,你会做

preds = kmeans.predict(whateverYouWantToPredict)

Those would be your final predictions then, but note that this is an unsupervised method .那将是您的最终预测,但请注意,这是一种无监督的方法 You could use the predictions of this by appending the predictions to the dataframe and then use that data to train a supervised model for another prediction.您可以通过将预测附加到 dataframe 来使用此预测,然后使用该数据训练受监督的 model进行另一个预测。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM