Visualize text classes in a scatter-plot

Question

I am looking for ways to investigate in my train data 'modellability' and check if the classes are well distinguished in terms of vocabulary... etc.

I am a bit embarrassed but I was wondering if it is possible to do a scatter plot for text classification model in torch? or any other approach to investigate in the data quality.

Answer 1

You can use dimensionality reduction (PCA, t-SNE or UMAP) + a color hue to inspect your data. I recommend using bokeh to interactively look at your data even though here I'll show you with seaborn.

import numpy as np
import seaborn as sns
import umap
from sklearn.datasets import load_digits

digits = load_digits()

embedding = umap.UMAP().fit_transform(digits.data)  # 2D embedding

sns.scatterplot(x=embedding[:,0], y=embedding[:,1], hue=digits.target)

Visualize text classes in a scatter-plot

Question

1 answers

solution1
0 2020-04-18 21:21:32

Visualize text classes in a scatter-plot

Question

1 answers

solution1 0 2020-04-18 21:21:32

solution1
0 2020-04-18 21:21:32