简体   繁体   English

Lucene 6 - 推荐使用术语词汇表存储数字字段的方法

[英]Lucene 6 - recommended way to store numeric fields with term vocabulary

In Lucene 6, LongField and IntField have been renamed to LegacyLongField and LegacyIntField , deprecated with a JavaDoc suggestion to use LongPoint and IntPoint classes instead. 在Lucene 6中, LongFieldIntField已经重命名为LegacyLongFieldLegacyIntField ,不推荐使用JavaDoc建议来改为使用LongPointIntPoint类。

However, it seems impossible to build a term vocabulary (=enumerate all distinct values) of these XPoint fields. 但是,似乎无法构建这些XPoint字段的术语词汇表(=枚举所有不同的值)。 Lucene mailing list entry confirms it Lucene邮件列表条目证实了这一点

PointFields are different than conventional inverted fields, so they also don't show up in fields(). PointFields与传统的倒置字段不同,因此它们也不会出现在字段()中。 You cannot get a term dictionary from them. 你不能从他们那里得到一个术语词典。

As a third option, one can add a field of class NumericDocValuesField , which as far as I know, also doesn't provide a way of building term vocabulary. 作为第三种选择,可以添加NumericDocValuesField类的字段,据我所知,它也没有提供构建术语词汇表的方法。

Is there a non-deprecated way of indexing a numeric field in Lucene 6, given the requirement to build a term vocabulary? 在给出构建术语词汇表的要求时,Lucene 6中是否有一种不推荐的索引数字字段的方法?

In my case I just duplicated the field once as LongPoint and once as a stored non-indexed field both fields with the same name. 在我的情况下,我只是将字段重复一次作为LongPoint,一次作为存储的非索引字段重复两个具有相同名称的字段。

in my case it is roughly 在我的情况下,它是粗略的

doc.add(new NumericDocValuesField("ts", timestamp.toEpochMilli()));
doc.add(new LongPoint("ts", timestamp.toEpochMilli()));
doc.add(new StoredField("ts", timestamp.toEpochMilli()));

It is a bit ugly, but think of it as adding an index to the stored field. 它有点难看,但想到它是为存储字段添加索引。 These field types can use the same name without interfering. 这些字段类型可以使用相同的名称而不会产生干扰。

The DocValues for document age based scoring and the LongPoint for range queries. DocValues用于基于文档年龄的评分,LongPoint用于范围查询。

I had the same issue and finally found a solution for my use case - I'm indexing, not storing, a LongPoint: 我有同样的问题,最后找到了我的用例的解决方案 - 我正在索引,而不是存储,一个LongPoint:

doc.add(new LongPoint("time",timeMsec));

My first idea was to create the query like this: 我的第一个想法是创建这样的查询:

Query query = parser.parse("time:[10003 TO 10003]");
System.err.println( "Searching for: " + query + " (" + query.getClass() + ")" );

But this will not return ANY document, at least not with the StandardAnalyzer and the default QueryParser :-( 但这不会返回任何文档,至少不会返回StandardAnalyzer和默认的QueryParser :-(

The printout is: "Searching for: time:[10003 TO 10003] (class org.apache.lucene.search.TermRangeQuery)" 打印输出为:“正在搜索:时间:[10003 TO 10003](类org.apache.lucene.search.TermRangeQuery)”

What works, however, is creating the query with LoingPoint.newRangeQuery() : 然而,有效的是使用LoingPoint.newRangeQuery()创建查询

Query query = LongPoint.newRangeQuery("time", 10003, 10003);
System.err.println( "Searching for: " + query + " (" + query.getClass() + ")" );

This prints: "Searching for: time:[10003 TO 10003] (class org.apache.lucene.document.LongPoint$1)". 这打印:“搜索:时间:[10003至10003](类org.apache.lucene.document.LongPoint $ 1)”。 So the standard QueryParser is creating a TermRangeQuery instead of a LoingPoint range query. 因此标准的QueryParser正在创建TermRangeQuery而不是LoingPoint范围查询。 I'm new to Lucene so don't understand the details here, but it would be nice for the QuerParser to support LongPoint seamlessly... 我是Lucene的新手,所以不了解这里的细节,但QuerParser无缝支持LongPoint会很不错......

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM