简体   繁体   中英

Plotting DBSCAN Clustering of Doc2Vec model

I have a Doc2Vec model created with Gensim and want to use scikit-learn DBSCAN to look for clustering of sentences within the model.

I'm struggling to work out how to best transform the model vectors to work with DBSCAN and plot clusters and am not finding many directly applicable examples on the web.

Here is what I have so far:

import gensim
import numpy as np
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt

fnIn = 'NLPModels/doc2VecModel_vector_size{0}_epochs{1}.bin'

def doCluster(vector_size, epochs):
    model = gensim.models.doc2vec.Doc2Vec.load(fnIn.format(vector_size, epochs))

    Y = model.docvecs.index2entity # tags

    X = [] # Document vectors
    for tag in Y:
        X.append(model.docvecs[tag])

    db = DBSCAN(eps=.1, min_samples=5, metric='cosine').fit_predict(X)
    labels = set(db)
    print(labels)


doCluster(100, 10)

Output: {0, 1, -1}

Which I believe to be two clusters (0 and 1) and outliers (-1).

Am I going about this in the right way?

How would I plot this on a chart to visualise the clusters?

Thanks.

There are two questions here:

  1. Visualization: I suggest you refine the DBSCAN clustering example code

  2. If you are doing the clustering correctly. On the first glance - yes.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM