简体   繁体   English

绘制新文档以散布图

[英]Plotting new documents to scatter plot

I am looking to gain some insight into my data. 我希望对我的数据有一些了解。 I am converting them into VSM using sklearn PCA and plotting them to a matplotlib graph. 我正在使用sklearn PCA将它们转换为VSM并将其绘制到matplotlib图。 THis involves 这涉及

  1. Casting the documents to a number matrix using pipeline 使用管道将文档转换为数字矩阵

     test = pipeline.fit_transform(docs).todense() 
  2. Fitting it to my model 适合我的模型

     pca = PCA().fit(test) 
  3. Then I am converting it using transform 然后我使用转换将其转换

      data = pca.transform(test) 
  4. Finally I am plotting the results using Matplotlib 最后,我使用Matplotlib绘制结果

      plt.scatter(data[:,0], data[:,1], c = categories) 

My question is this: How do I take new sentences and determine where they would lie in relation to the other documents plotted. 我的问题是:我该如何使用新句子并确定它们相对于其他文档的位置。 Using an X to mark their relative positions ? 用X标记它们的相对位置?

Thanks 谢谢

  1. Also cast the new documents to a numeric array 还将新文档转换为数值数组

     new = pipeline.transform(new_docs).todense() 

    Note that this uses the pipeline with the previously fitted parameters, hence it's pipeline.transform , not pipeline.fit_transform . 请注意,这里使用的pipeline与先前安装参数,因此它pipeline.transform ,不pipeline.fit_transform

  2. Transform the new data using the previously fitted pca . 使用先前安装的pca转换新数据。

     new_data = pca.transform(new) 

    This will transform the new data to the same PC-space as the original data. 这会将新数据转换为与原始数据相同的PC空间。

  3. Add the new data to the plot using a second scatter . 使用第二个scatter将新数据添加到绘图中。

     plt.scatter(data[:,0], data[:,1], c = categories) plt.scatter(new_data[:,0], new_data[:,1], marker = 'x') plt.show() 

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM