Lucene中多值字段的性能问题

Question

We're using Lucene 4.7 to build and query a rather large data set (110+ millions documents). 我们正在使用Lucene 4.7构建和查询相当大的数据集（110多个百万文档）。

One of the document field, which we used for faceting, is defined as follow: 我们用于刻面的document字段之一定义如下：

<field name="topic_paths"
       type="string"
       indexed="false"
       stored="false"
       docValues="true"
       multiValued="true"
       termVectors="false"
       termPositions="false"
       termOffsets="false"/>

Whenever we include this field in queries, they become extremely slow: about 7 seconds per topic_path value included in the search, so about 30 seconds for four topic_path values (typical in our case). 每当我们在查询中包含此字段时，它们就会变得非常缓慢：搜索中包含的每个topic_path值大约需要7秒，因此四个topic_path值大约需要30秒（在我们的示例中为典型值）。

Queries that don't use this field are very fast (15 ms). 不使用该字段的查询非常快（15毫秒）。

Is this performance we should expect from Lucene with multi-valued fields used for faceting? 我们应该期望Lucene具有用于多面值的多值字段的性能吗？ Is there anything wrong or suboptimal with our field definition? 我们的字段定义有什么错误或不理想吗？ Are there tricks we could use to speedup searches? 我们可以使用一些技巧来加快搜索速度吗？

Details: 细节：

Hardware: Xen VM, 8-core Xeon CPU E5-2670 v2 at 2.5 GHz, 64 GB RAM 硬件：Xen VM，2.5 GHz时的8核Xeon CPU E5-2670 v2、64 GB RAM
OS: Windows Server 2012 Standard 操作系统：Windows Server 2012 Standard
JVM: started with -Xmx8000m (Lucene is using about 45% of that) JVM：以-Xmx8000m开头（Lucene使用了其中的45％）
Lucene queries are single-threaded Lucene查询是单线程的

Answer 1

Read this article, http://wiki.apache.org/solr/SchemaXml#Fields 阅读本文， http：//wiki.apache.org/solr/SchemaXml#Fields

You need to "index" you field for including it into search/faceting, otherwise Solr will skipping this field without any exception 您需要为您的字段“索引”以将其包括在搜索/方面中，否则Solr将毫无例外地跳过此字段

Lucene中多值字段的性能问题

问题描述

1 个解决方案

解决方案1
1 2015-04-27 23:56:40

Lucene中多值字段的性能问题

问题描述

1 个解决方案

解决方案1 1 2015-04-27 23:56:40

解决方案1
1 2015-04-27 23:56:40