简体   繁体   中英

pyclustering visualising xmeans when the matrix has more than three dimensions

I'm trying to cluster and visualise some data with xmeans from the pyclustering lib. I copied the code directly from the example in the documentation,

from pyclustering.cluster import cluster_visualizer
from pyclustering.cluster.xmeans import xmeans
from pyclustering.cluster.center_initializer import kmeans_plusplus_initializer
from pyclustering.utils import read_sample
from pyclustering.samples.definitions import SIMPLE_SAMPLES
sample = X # read_sample(SIMPLE_SAMPLES.SAMPLE_SIMPLE3)    
# Prepare initial centers - amount of initial centers defines amount of clusters from which X-Means will
# start analysis.
amount_initial_centers = 2
initial_centers = kmeans_plusplus_initializer(sample, amount_initial_centers).initialize()
# Create instance of X-Means algorithm. The algorithm will start analysis from 2 clusters, the maximum
# number of clusters that can be allocated is 20.
xmeans_instance = xmeans(sample, initial_centers, 20)
xmeans_instance.process()
# Extract clustering results: clusters and their centers
clusters = xmeans_instance.get_clusters()
centers = xmeans_instance.get_centers()
# Print total sum of metric errors
print("Total WCE:", xmeans_instance.get_total_wce())
# Visualize clustering results
visualizer = cluster_visualizer()
visualizer.append_clusters(clusters, sample)
visualizer.append_cluster(centers, None, marker='*', markersize=10)
visualizer.show()

The only difference is that I assigned to sample the value of my matrix X instead of loading a sample dataset.

When I try to visualise the clustering result I get this error:

Only objects with size dimension 1 (1D plot), 2 (2D plot) or 3 (3D plot) can be displayed. For multi-dimensional data use 'cluster_visualizer_multidim'.

My X matrix is generated in this way:

features = ["I", "Iu", other 7 column names]
data = df[features]
...
X = scaler.fit_transform(data)

Is there a way to visualise the clusters and plotting only two/three features at a time?

I can't find anything on the documentation.

I tried this:

visualizer.append_clusters(clusters, sample[:,[0,1]])

in order to visualise only the first two features and got this error

Only clusters with the same dimension of objects can be displayed on canvas.

EDIT:

I updated the code as suggested in the answer by annoviko but now I get the following error:

ValueError                                Traceback (most recent call last)
<ipython-input-69-6fd7d2ce5fcd> in <module>
     20 visualizer.append_clusters(clusters, X)
     21 visualizer.append_cluster(centers, None, marker='*', markersize=10)
---> 22 visualizer.show(pair_filter=[[0, 1], [0, 2]])

/usr/local/lib/python3.8/site-packages/pyclustering/cluster/__init__.py in show(self, pair_filter, **kwargs)
    224             raise ValueError("There is no non-empty clusters for visualization.")
    225 
--> 226         cluster_data = self.__clusters[0].data or self.__clusters[0].cluster
    227         dimension = len(cluster_data[0])
    228 

ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

It is raised by visualizer.show(), and it happens even if I remove the pair_filter from within the function call.

In line with the error that you got:

Only objects with size dimension 1 (1D plot), 2 (2D plot) or 3 (3D plot) can be displayed. For multi-dimensional data use 'cluster_visualizer_multidim'.

You have to use cluster_visualizer_multidim as it was mentioned. There is a documentation (pyclustering 0.10.1) with an example: https://pyclustering.github.io/docs/0.10.1/html/dc/d6b/classpyclustering_1_1cluster_1_1cluster__visualizer__multidim.html

For example, if you have a data ( D > 3 ) and you want to display (x0, x1) and (x0, x2) then you can display it in the following way:

visualizer = cluster_visualizer_multidim()
visualizer.append_clusters(clusters, sample_4d)
visualizer.show(pair_filter=[[0, 1], [0, 2]])

Where pair_filter specifies which features should be shown. In example above, it will show only (x0, x1) - [0, 1] and (x0, x2) - [0, 2] .

So, in your particular case where you have to display only first two features it should be:

visualizer = cluster_visualizer_multidim()
visualizer.append_clusters(clusters, sample)
visualizer.show(pair_filter=[[0, 1]])

I think I have to make error more readable and make a proposal to use another class in the first sentence. Let me know if it helps (if it is still relevant for you).

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM