简体   繁体   English

solr lucene 中的索引

[英]indexing in solr lucene

I have a site in which users can post some questions, so I a have a table in mysql like this我有一个网站,用户可以在其中发布一些问题,所以我在 mysql 中有一个像这样的表

question_id, user_id, tags, views, creation_date question_id、user_id、标签、视图、creation_date

what I want is to be able to我想要的是能够

  • perform searches which will return question_ids based on those tags执行将根据这些标签返回 question_ids 的搜索

    and order them by并订购它们

    1. Views意见
    2. date, (like newest, or this week, month)日期,(如最新,或本周,月)
  • or searches for a specified user and return question_ids again ordered by views and date.或搜索指定用户并再次返回按视图和日期排序的 question_ids。

In what way should I bring everything in solr, as far as indexing is concerned?就索引而言,我应该以什么方式将 solr 中的所有内容都带入? Will I have to index tags, views, date?我必须索引标签、视图、日期吗? What should I index so that I have maximal performance?我应该索引什么才能获得最佳性能?

Think about, if using lucene/solr is relay a benefit for you.想一想,如果使用 lucene/solr 对您有好处。 I don't wanna be misunderstood, but if you like to search inside an column user_id for an specific user ID, you don't need a addition fulltext-search engine.我不想被误解,但是如果您想在 user_id 列中搜索特定用户 ID,则不需要添加全文搜索引擎。

Anyway - maybe you only like to have an little project to "play with" solr.无论如何 - 也许你只喜欢有一个小项目来“玩”solr。 So here are the answers of your questions:所以这里是你的问题的答案:

In what way should I bring everything in solr, as far as indexing is concerned?就索引而言,我应该以什么方式将 solr 中的所有内容都带入?

Put everything to solr/lucene, you need to search for.把所有东西都放到solr/lucene,你需要搜索。 Use the DHI (data import handler) http://wiki.apache.org/solr/DataImportHandler to let solr walk trough your table and index the data.使用 DHI(数据导入处理程序) http://wiki.apache.org/solr/DataImportHandler让 solr 遍历您的表并索引数据。

Will I have to index tags, views, date?我必须索引标签、视图、日期吗?

Yes.是的。 You have to index all the things you like to work with.你必须索引所有你喜欢使用的东西。 btw: there is a difference between indexing and storing data.顺便说一句:索引和存储数据之间存在差异。 You can index fields (like tags, user_id, views,..) but you don't need to store them (additional) inside your lucene index.您可以索引字段(如标签、user_id、视图......),但您不需要将它们(附加)存储在 lucene 索引中。 Storing data is necessary, if lucene/solr have to return/deliver the searched data.如果 lucene/solr 必须返回/传递搜索到的数据,则存储数据是必要的。 Otherwise, solr only returns the uniqueKey (primary key) of the matches documents and you have to fetch the data from the datebes (...where pk=< lucene result >) So you don't need to store those fields, which are only relevant for sorting (for example).否则,solr 仅返回匹配文档的 uniqueKey(主键),您必须从 datebes 中获取数据(...其中 pk=< lucene 结果 >)所以您不需要存储这些字段,它们是仅与排序相关(例如)。

What should I index so that I have maximal performance?我应该索引什么才能获得最佳性能?

Index only those fields (columns), you need to work with (solr).仅索引那些需要使用 (solr) 的字段(列)。 Don't index field you will never ask for / search for.不要索引您永远不会要求/搜索的字段。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM