简体   繁体   English

可视化和聚类

[英]Visualize and clustering

Earlier on i post a question about visualization and clustering. 早些时候,我发布了一个关于可视化和集群的问题。 I guess my question was not quite clear enough so I post it again. 我想我的问题还不够清楚,因此我再次发布了它。 I hope i make a better explanation this time . 我希望这次我能做出更好的解释。 I also apologize for not "accept answer" for my old questions. 我也为我的旧问题未“接受答案”表示歉意。 I didn't know i can do that until a guy point it out. 在一个人指出之前,我不知道我能做到这一点。 I will definitely do it from now on. 从现在开始,我一定会做。

Okay. 好的。 Back to the question. 回到问题。 Previously i have written a python script to calculate the similarity between document. 以前我写过一个python脚本来计算文档之间的相似度。 Now i have all the data write to notepad and it looks like this: 现在我将所有数据写入记事本,它看起来像这样:

(1, 6821): inf

(1, 8): 3.458911570

(1, 9): 7.448105193

(1, 10): inf

(1, 11): inf

(6821, 8): inf

(6821, 9): inf

(6821, 10): inf

(6821, 11): inf

(8, 9): 2.153308936

(8, 10): inf

(8, 11): 16.227647992

(9, 10): inf

(9, 11): 34.943139430

(10, 11): inf

The number in the parenthesis represents document numbers. 括号中的数字代表文件编号。 And the value after it, is the distance between the two documents. 其后的值就是两个文档之间的距离。 What i want is actually visualization tools or method which i can create nodes that represent each documents number . 我真正想要的是可视化工具或方法,可以创建代表每个文档编号的节点 For example here, i have 6 different documents. 例如在这里,我有6个不同的文档。 So i wish to create 6 different nodes that represent my document numbers. 因此,我希望创建代表我的文档编号的6个不同的节点。 Then, i want to have edges that connect these nodes together based on their distances. 然后,我希望具有基于这些节点的距离将它们连接在一起的边缘。 For example the distance between document 1 and 8 is 3.46 while the distance between document 1 and 9 is 7.45. 例如,文档1和8之间的距离为3.46,而文档1和9之间的距离为7.45。 So, 1 & 8 need to cluster closer than 1 & 9. While the document pairs with 'inf' distance shouldn't have any connection or edge connecting them together. 因此,1和8的聚类比1和9的聚类更紧密。而具有“ inf”距离的文档对不应有任何连接或边缘将它们连接在一起。

This sounds easy but i have really hard time finding an open source visualization tool that can effective help me to perform this. 这听起来很容易,但是我很难找到一个可以有效帮助我执行此操作的开源可视化工具 I appreciate any suggestion recommendation. 我感谢任何建议。

Have you tried GraphViz ? 您是否尝试过GraphViz I use it for situations like this. 我将其用于此类情况。 I haven't tried altering the length of the node connections, you'll have to tease that one out. 我没有尝试更改节点连接的长度,您必须将其中的一个逗弄出来。 Check out the list of example graphs as a starting point. 查看示例图列表作为起点。

http://www.graphviz.org/ http://www.graphviz.org/

In particular, the neato package: 特别是, neato软件包:

$ cat similar.dot
graph g {
   n1 -- n8 [ weight = 3.458911570 ];
   n1 -- n9 [ weight = 7.448105193 ];
   n8 -- n9 [ weight = 2.153308936 ];
   n8 -- n11 [ weight = 16.227647992 ];
   n9 -- n11 [ weight = 34.943139430 ];
   n10;
   n6821;
}
$ neato -Tpng similar.dot -o similar.png

Processing is a really lovely tool for data visualization (and also language, based on Java). 处理是用于数据可视化(以及基于Java的语言)的非常好的工具。 Think of it as writing simplified OpenGL (you can even use OpenGL with it if you want it) in Java plus the freedom to use all the Java libraries. 您可以将其视为用Java编写简化的OpenGL(如果需要,甚至可以使用OpenGL)以及使用所有Java库的自由。 You can even embed your Processing app inside another Swing or AWT application. 您甚至可以将Processing应用程序嵌入另一个Swing或AWT应用程序中。

Here's the main page , and the brand new wiki . 这是主页 ,还有全新的Wiki

You said you used Pyton. 您说您曾经使用过Pyton。 There's a hack so you can use Jython instead of Java in this blog post . 有一个技巧,因此您可以在本博文中使用Jython代替Java。 I haven't tried it but maybe it works fine. 我没有尝试过,但也许效果很好。 The only lack in using another languageh (there's also a JavaScript 'port', Processing.js ) is that all the examples are for the Processing language (based on Java). 使用另一种语言的唯一缺点是h(还有一个JavaScript'port', Processing.js )是所有示例都针对处理语言(基于Java)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM