简体   繁体   中英

Visualize text classes in a scatter-plot

I am looking for ways to investigate in my train data 'modellability' and check if the classes are well distinguished in terms of vocabulary... etc.

I am a bit embarrassed but I was wondering if it is possible to do a scatter plot for text classification model in torch? or any other approach to investigate in the data quality.

You can use dimensionality reduction (PCA, t-SNE or UMAP) + a color hue to inspect your data. I recommend using bokeh to interactively look at your data even though here I'll show you with seaborn.

import numpy as np
import seaborn as sns
import umap
from sklearn.datasets import load_digits

digits = load_digits()

embedding = umap.UMAP().fit_transform(digits.data)  # 2D embedding

sns.scatterplot(x=embedding[:,0], y=embedding[:,1], hue=digits.target)

在此处输入图片说明

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM