I have a Doc2Vec model created with Gensim and want to use scikit-learn
DBSCAN to look for clustering of sentences within the model.
I'm struggling to work out how to best transform the model vectors to work with DBSCAN and plot clusters and am not finding many directly applicable examples on the web.
Here is what I have so far:
import gensim
import numpy as np
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt
fnIn = 'NLPModels/doc2VecModel_vector_size{0}_epochs{1}.bin'
def doCluster(vector_size, epochs):
model = gensim.models.doc2vec.Doc2Vec.load(fnIn.format(vector_size, epochs))
Y = model.docvecs.index2entity # tags
X = [] # Document vectors
for tag in Y:
X.append(model.docvecs[tag])
db = DBSCAN(eps=.1, min_samples=5, metric='cosine').fit_predict(X)
labels = set(db)
print(labels)
doCluster(100, 10)
Output: {0, 1, -1}
Which I believe to be two clusters (0 and 1) and outliers (-1).
Am I going about this in the right way?
How would I plot this on a chart to visualise the clusters?
Thanks.
There are two questions here:
Visualization: I suggest you refine the DBSCAN clustering example code
If you are doing the clustering correctly. On the first glance - yes.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.