简体   繁体   English

带有python的Google App Engine数据存储标签云

[英]Google app engine datastore tag cloud with python

We have some unstructured textual data in our app engine datastore. 我们的应用程序引擎数据存储区中有一些非结构化的文本数据。 I wanted to create a 'one off' tag cloud of one property on a subset of the datastore objects. 我想在数据存储对象的子集上创建一个属性的“一次性”标签云。 After a look around, I can't see any framework that will allow me to do this without writing it myself. 环顾四周后,我看不到任何框架,无需我自己编写即可执行此操作。

The way I had in mind was: 我的想法是:

  • Write a map (as in map reduce) function to go over every object of the particular type in a datastore, 编写一个映射(如在map reduce中)函数来遍历数据存储区中特定类型的每个对象,
  • Split the text string into words 将文本字符串拆分为单词
  • For each word increment a counter 每个字增加一个计数器
  • Use the final counts to generate the tag cloud with some third party software (offline - any suggestions here welcome) 使用最终计数通过一些第三方软件生成标签云(离线-欢迎此处提供任何建议)

As I've never done this before, I was wandering if firstly there is some framework around that does this for me ( please ) of if not am I approaching it in the right way. 因为我以前从未做过此事,所以我一直在徘徊,如果首先有一个框架可以帮助我( )我是否以正确的方式来解决这个问题。 ie please feel free to point out gaping holes in the plan. 即,请随时指出计划中的漏洞。

Feed TagCloud and PyTagCloud are two possibilities. Feed TagCloudPyTagCloud是两种可能性。

  • Feed TagCloud Generator Gadget for Google App Engine might fit your needs. 适用于Google App Engine的Feed TagCloud Generator小工具可能会满足您的需求。 Unfortunately, it's undocumented. 不幸的是,它没有记录。 Fortunately it's rather simple, though I'm not sure how well-suited it is to your needs. 幸运的是,它非常简单,尽管我不确定它是否非常适合您的需求。

    It operates on a feed, and appears to be somewhat flexible, so if you have an feed of your site, it might not be too much trouble to integrate, though all processing will be online. 它在提要上运行,并且看起来有些灵活,因此,如果您拥有站点的提要,那么尽管所有处理都可以在线进行,但集成起来可能不会有太多麻烦。

  • PyTagCloud is also worth a look. PyTagCloud也值得一看。 You'll be able to do the processing offline, and it generates rather handsome clouds. 您将能够离线进行处理,并且它会生成相当漂亮的云。

    All you'll have to do to get this working, is export your datastore; 要使此工作正常进行,您需要做的就是导出数据存储区。 the counts and splitting will be done for you, as PyTagCloud can operate on text files. 因为PyTagCloud可以对文本文件进行操作,所以计数和拆分将为您完成。 Following the instructions in the App Engine docs about Uploading and Downloading Data will show you how to export the datastore to your local machine. 按照App Engine文档中有关上传和下载数据的说明进行操作,将向您展示如何将数据存储区导出到本地计算机。 You'll want to write an "Exporter Class", and have PyTagCloud operate on the output. 您将要编写一个“导出程序类”,并使PyTagCloud在输出上进行操作。


If you decide to roll your own, you probably want to skip the online processing and use the offline method of Uploading and Downloading Data above, unless you want a dynamically-updated cloud. 如果您决定自己动手,除非您需要动态更新的云,否则您可能希望跳过联机处理并使用上面的“ 上传和下载数据”的脱机方法。 Iterating over your entire data store, and doing online counts is the most annoying and expensive part of the task. 遍历整个数据存储并进行在线计数是任务中最烦人,最昂贵的部分。 It only makes sense to do this if you want or need a dynamic tag-cloud. 仅在您需要或需要动态标签云时才有意义。 As above, I'd recommend writing an "Exporter Class", and operating on that locally. 如上所述,我建议编写一个“导出程序类”,并在本地对其进行操作。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM