绘制新文档以散布图

Question

I am looking to gain some insight into my data. 我希望对我的数据有一些了解。 I am converting them into VSM using sklearn PCA and plotting them to a matplotlib graph. 我正在使用sklearn PCA将它们转换为VSM并将其绘制到matplotlib图。 THis involves 这涉及

Casting the documents to a number matrix using pipeline 使用管道将文档转换为数字矩阵
```
 test = pipeline.fit_transform(docs).todense() 
```
Fitting it to my model 适合我的模型
```
 pca = PCA().fit(test) 
```
Then I am converting it using transform 然后我使用转换将其转换
```
  data = pca.transform(test) 
```
Finally I am plotting the results using Matplotlib 最后，我使用Matplotlib绘制结果
```
  plt.scatter(data[:,0], data[:,1], c = categories) 
```

My question is this: How do I take new sentences and determine where they would lie in relation to the other documents plotted. 我的问题是：我该如何使用新句子并确定它们相对于其他文档的位置。 Using an X to mark their relative positions ? 用X标记它们的相对位置？

Thanks 谢谢

Answer 1

Also cast the new documents to a numeric array 还将新文档转换为数值数组
```
 new = pipeline.transform(new_docs).todense() 
```
Note that this uses the pipeline with the previously fitted parameters, hence it's pipeline.transform , not pipeline.fit_transform . 请注意，这里使用的pipeline与先前安装参数，因此它pipeline.transform ，不pipeline.fit_transform 。
Transform the new data using the previously fitted pca . 使用先前安装的pca转换新数据。
```
 new_data = pca.transform(new) 
```
This will transform the new data to the same PC-space as the original data. 这会将新数据转换为与原始数据相同的PC空间。

Add the new data to the plot using a second scatter . 使用第二个scatter将新数据添加到绘图中。

 plt.scatter(data[:,0], data[:,1], c = categories) plt.scatter(new_data[:,0], new_data[:,1], marker = 'x') plt.show()

绘制新文档以散布图

问题描述

1 个解决方案

解决方案1
1 已采纳 2017-08-11 21:58:20

绘制新文档以散布图

问题描述

1 个解决方案

解决方案1 1 已采纳 2017-08-11 21:58:20

解决方案1
1 已采纳 2017-08-11 21:58:20