Plotting DBSCAN Clustering of Doc2Vec model

Question

I have a Doc2Vec model created with Gensim and want to use scikit-learn DBSCAN to look for clustering of sentences within the model.

I'm struggling to work out how to best transform the model vectors to work with DBSCAN and plot clusters and am not finding many directly applicable examples on the web.

Here is what I have so far:

import gensim
import numpy as np
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt

fnIn = 'NLPModels/doc2VecModel_vector_size{0}_epochs{1}.bin'

def doCluster(vector_size, epochs):
    model = gensim.models.doc2vec.Doc2Vec.load(fnIn.format(vector_size, epochs))

    Y = model.docvecs.index2entity # tags

    X = [] # Document vectors
    for tag in Y:
        X.append(model.docvecs[tag])

    db = DBSCAN(eps=.1, min_samples=5, metric='cosine').fit_predict(X)
    labels = set(db)
    print(labels)


doCluster(100, 10)

Output: {0, 1, -1}

Which I believe to be two clusters (0 and 1) and outliers (-1).

Am I going about this in the right way?

How would I plot this on a chart to visualise the clusters?

Thanks.

Answer 1

There are two questions here:

Visualization: I suggest you refine the DBSCAN clustering example code
If you are doing the clustering correctly. On the first glance - yes.

Plotting DBSCAN Clustering of Doc2Vec model

Question

1 answers

solution1
0 ACCPTED 2020-04-20 15:58:59

Plotting DBSCAN Clustering of Doc2Vec model

Question

1 answers

solution1 0 ACCPTED 2020-04-20 15:58:59

solution1
0 ACCPTED 2020-04-20 15:58:59