简体   繁体   English

Google App Engine(Java)上的全文搜索

[英]Full text search on Google App Engine (Java)

There are a few threads floating around on the topic, but I think my use-case is somewhat different. 关于这个主题有几个线程浮动,但我认为我的用例有些不同。

What I want to do: 我想做的事:

  • Full text search component for my GAE/J app 我的GAE / J应用程序的全文搜索组件
  • The index size is small: 25-50MB or so 索引大小很小:25-50MB左右
  • I do not need live updates to the index, a periodic re-indexing is fine 我不需要对索引进行实时更新,定期重新编制索引就可以了
  • This is for auto-complete and the like, so it needs to be extremely fast (I get the impression that implementing an inverted index in Datastore introduces considerable latency) 这是为了自动完成等,所以它需要非常快(我得到的印象是在数据存储区中实现反向索引会引入相当大的延迟)

My strategy so far (just planning, haven't tried implementing anything yet): 到目前为止我的策略(只是计划,尚未尝试实施任何东西):

  • Use Lucene with RAMDirectory 将Lucene与RAMDirectory一起使用
  • A periodic cron job creates the index, serializes it to the Datastore, stores an update id (or timestamp) 定期cron作业创建索引,将其序列化到数据存储区,存储更新ID(或时间戳)
  • Search servlet loads the index on startup and creates the RAMDirectory 搜索servlet在启动时加载索引并创建RAMDirectory
  • On each request the servlet checks the current update id and reloads the index as necessary 在每个请求上,servlet检查当前的更新ID并根据需要重新加载索引

The main thing I'm fuzzy on is how to synchronize in-memory data between instances - will this work, or am I missing something? 我模糊的主要问题是如何在实例之间同步内存中的数据 - 这会起作用,还是我错过了什么?

Also, how far can I push it before I start having problems with memory use? 另外,在我开始使用内存问题之前,我可以在多大程度上推送它? I couldn't find anything on RAM quotas for GAE. 我在GAE的RAM配额上找不到任何东西。 (This index is small, but I can think of more stuff I'd like to add) (这个索引很小,但我可以想到更多我想添加的东西)

And, of course, any thoughts on better approaches? 当然,有关更好方法的任何想法吗?

Recently GAE added "text search" service. 最近GAE增加了“文本搜索”服务。 Take a look at GAE Java Text Search 看看GAE Java Text Search

If you're okay with periodic rebuilds, and your index is small, your current approach sounds mostly okay. 如果您对定期重建没有问题,并且您的索引很小,那么您当前的方法听起来很不错。 Instead of building the index online and serializing it to the datastore, though, why not build it offline, and upload it with the app? 不过,不是在线构建索引并将其序列化到数据存储区,为什么不在线下构建它,并使用应用程序上传它? Then, you can instantiate it directly from the disk store, and to push an update, you deploy a new version of your app. 然后,您可以直接从磁盘存储区实例化它,并推送更新,部署新版本的应用程序。

App Engine现在包含全文搜索API(实验性): https//developers.google.com/appengine/docs/java/search/

For autocomplete, perhaps you could store the top N matches for each prefix (basically what you'd put in the drop-down menu) in memcache? 对于自动完成,也许您可​​以在memcache中为每个前缀(基本上放在下拉菜单中)存储前N个匹配项? The memcache entities could be backed by entities in the datastore and reloaded if needed. memcache实体可以由数据存储区中的实体支持,并在需要时重新加载。

Well, as of GAE 1.5.0 looks like resident Backends can be used to create a search service. 好吧,从GAE 1.5.0看起来像常驻后端可以用来创建搜索服务。

Of course, there's no free quota for these. 当然,这些都没有免费配额。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM