简体繁体 English

Solr / Lucene：索引方面值

[英]Solr/Lucene: Indexing facet values

原文 2010-02-23 03:57:28 1 1 lucene/ solr/ lucene.net/ faceted-search/ solrnet

For example, say I have the following facet: 例如，假设我有以下方面：

Colors 颜色

Red (7825) 红色（7825）
Orange (2343) 橙色（2343）
Green (843) 绿色（843）
Blue (5412) 蓝色（5412）

In my database, colors would be a table and each color would have a primary key and a name/value. 在我的数据库中，颜色将是一个表，每种颜色都有一个主键和一个名称/值。

When indexing with Solr/Lucene, in all of the examples I've seen, the value is indexed and not the primary key. 使用Solr / Lucene进行索引时，在我看到的所有示例中，值都是索引而不是主键。 So if I filter by the color red, I would get something like the following: 因此，如果我按红色过滤，我会得到如下内容：

http://www.example.com/search?color=Red http://www.example.com/search?color=Red

I'm wondering, is it wise to instead index the primary key and retrieve the values from the database when displaying the facet values? 我想知道，在显示构面值时，改为索引主键并从数据库中检索值是明智的吗？ So I would instead get something like this: 所以我会改为这样：

http://www.example.com/search?color=1 http://www.example.com/search?color=1

"1" representing the primary key of the color red. “1”表示红色的主键。 I'm wondering if I should take this approach since the values of many of my facets frequently change, but the primary keys stay the same. 我想知道我是否应该采用这种方法，因为我的许多方面的值经常改变，但主键保持不变。 Also, the index is required to be in sync with the database. 此外，索引需要与数据库同步。

Does anymore have any experience with this? 对此有任何经验吗？ How do you think this will affect performance? 您如何看待这会影响性能？

Thanks in advance! 提前致谢！

1 个解决方案

If you expect your entities to change frequently, it's easier to index the ID's, and when you get your facet results, do a lookup in the database to get the names of the colors. 如果您希望实体经常更改，则可以更容易地对ID进行索引，当您获得facet结果时，请在数据库中进行查找以获取颜色的名称。 That way changes to colors wouldn't require affected documents to be updated in the index. 这样改变颜色就不需要在索引中更新受影响的文档。

In our system, we index the ID's Lucene instead of the name of the entities, exactly because of the reasons you stated. 在我们的系统中，我们索引ID的Lucene而不是实体的名称，完全是因为您说明的原因。 Also, our entities have a bunch of properties associated with them, which aren't indexed, so we would have to hit the database to get them anyway. 此外，我们的实体有一堆与它们相关联的属性，这些属性没有编入索引，因此无论如何我们都必须点击数据库来获取它们。

As far as performance goes, the faceting of ID's won't be discernibly slower or faster. 就性能而言，ID的刻面将不会明显更慢或更快。 As far as the database lookups go, it shouldn't be a big deal, especially if you're only pulling down tens of facets at a time. 就数据库查找而言，它应该不是什么大问题，特别是如果你一次只能减少数十个方面。 You can always use caching to speed that up if it becomes an issue. 如果问题成为问题，您始终可以使用缓存来加快速度。