[英]Plotting new documents to scatter plot
I am looking to gain some insight into my data. 我希望对我的数据有一些了解。 I am converting them into VSM using sklearn PCA and plotting them to a matplotlib graph.
我正在使用sklearn PCA将它们转换为VSM并将其绘制到matplotlib图。 THis involves
这涉及
Casting the documents to a number matrix using pipeline 使用管道将文档转换为数字矩阵
test = pipeline.fit_transform(docs).todense()
Fitting it to my model 适合我的模型
pca = PCA().fit(test)
Then I am converting it using transform 然后我使用转换将其转换
data = pca.transform(test)
Finally I am plotting the results using Matplotlib 最后,我使用Matplotlib绘制结果
plt.scatter(data[:,0], data[:,1], c = categories)
My question is this: How do I take new sentences and determine where they would lie in relation to the other documents plotted. 我的问题是:我该如何使用新句子并确定它们相对于其他文档的位置。 Using an X to mark their relative positions ?
用X标记它们的相对位置?
Thanks 谢谢
Also cast the new documents to a numeric array 还将新文档转换为数值数组
new = pipeline.transform(new_docs).todense()
Note that this uses the pipeline
with the previously fitted parameters, hence it's pipeline.transform
, not pipeline.fit_transform
. 请注意,这里使用的
pipeline
与先前安装参数,因此它pipeline.transform
,不pipeline.fit_transform
。
Transform the new data using the previously fitted pca
. 使用先前安装的
pca
转换新数据。
new_data = pca.transform(new)
This will transform the new data to the same PC-space as the original data. 这会将新数据转换为与原始数据相同的PC空间。
Add the new data to the plot using a second scatter
. 使用第二个
scatter
将新数据添加到绘图中。
plt.scatter(data[:,0], data[:,1], c = categories) plt.scatter(new_data[:,0], new_data[:,1], marker = 'x') plt.show()
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.