简体   繁体   English

如何在Lucene 4中搜索int字段?

[英]How to search an int field in Lucene 4?

I am trying to implement an index of documents (rougly corresponding to DB rows), where one of the fields is an integer. 我正在尝试实现文档索引(对应于DB行的rougly),其中一个字段是整数。 I'm adding them to index like: 我将它们添加到索引中:

Document doc = new Document();
doc.add(new StringField("ticket_number", rs.getString("ticket_number"),
        Field.Store.YES));
doc.add(new IntField("ticket_id", rs.getInt("ticket_id"),
        Field.Store.YES));
doc.add(new StringField("id_s", rs.getString("ticket_id"),
        Field.Store.YES));
w.addDocument(doc);

It seems I can't query the ticket_id field at all, while id_s works just fine. 好像我根本无法查询ticket_id字段,而id_s工作得很好。

One of the documents is (I added whitespace for readability): 其中一个文件是(为了便于阅读,我添加了空格):

Document<
    stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<ticket_number:230114W> 
    stored<ticket_id:152> 
    stored,indexed,tokenized,omitNorms,indexOptions=DOCS_ONLY<id_s:152>>

So my int field is stored, but not indexed. 所以我的int字段存储,但没有索引。 This query works as expected: id_s:152 , while this one never returns anything: ticket_id:152 . 此查询按预期工作: id_s:152 ,而此查询从不返回任何内容: ticket_id:152

What am I doing wrong? 我究竟做错了什么? How can I add such a field to the index and make it searchable? 如何将这样的字段添加到索引并使其可搜索?

Below works for me: 以下对我有用:

    RAMDirectory idx = new RAMDirectory();
    IndexWriter writer = new IndexWriter(
            idx,
            new IndexWriterConfig(Version.LUCENE_40, new ClassicAnalyzer(Version.LUCENE_40))
    );
    Document document = new Document();
    document.add(new StringField("ticket_number", "t123", Field.Store.YES));
    document.add(new IntField("ticket_id", 234, Field.Store.YES));
    document.add(new StringField("id_s", "234", Field.Store.YES));
    writer.addDocument(document);
    writer.commit();

    IndexReader reader = DirectoryReader.open(idx);
    IndexSearcher searcher = new IndexSearcher(reader);

    Query q1 = new TermQuery(new Term("id_s", "234"));
    TopDocs td1 = searcher.search(q1, 1);
    System.out.println(td1.totalHits);  // prints "1"

    Query q2 = NumericRangeQuery.newIntRange("ticket_id", 1, 234, 234, true, true);
    TopDocs td2 = searcher.search(q2, 1);
    System.out.println(td2.totalHits);  // prints "1"

As femtoRgon pointed out, for numeric values (longs, dates, floats, etc.) you need to have NumericRangeQuery and specify precision. 正如femtoRgon所指出的,对于数值(long,date,float等),您需要使用NumericRangeQuery并指定精度。 Otherwise Lucene has no idea how do you want to define similarity. 否则Lucene不知道你想如何定义相似性。

Numeric Fields can be queried with a NumericRangeQuery . 可以使用NumericRangeQuery查询数字字段。 For an exact match, simply set the max and min to equal values. 要获得完全匹配,只需将max和min设置为相等的值即可。

Your output indicating the field is not indexed could be due to the differences in how a numeric value is indexed, compared to a text value. 指示字段未编入索引的输出可能是由于与文本值相比,数值的索引方式不同。 Considering that the field is transformed into Lucene's numeric representation, the literal value 152 will indeed not be indexed 考虑到该字段被转换为Lucene的数字表示,字面值152确实不会被索引

At a glance, however, it's possible that your handling of id_s may be the better alternative. 但是,乍一看,您对id_s的处理可能是更好的选择。 IDs are not usually handled as numeric values, but rather as just simple identifiers that happen to be represented with digits. ID通常不作为数值处理,而是作为恰好用数字表示的简单标识符。 If you don't need numeric sorting or range querying on the field, indexing as a StringField certainly makes more sense. 如果您不需要对字段进行数字排序或范围查询,那么索引作为StringField肯定更有意义。

Another answer comes from this thread (third answer): Lucene 4.0 IndexWriter updateDocument for Numeric Term 另一个答案来自这个帖子(第三个答案): Lucene 4.0 IndexWriter updateDocument for Numeric Term

Basically, you create a Term with your int value like this: 基本上,您使用int值创建一个Term,如下所示:

String field = "myfield";
int value = 4711;
BytesRef bytes = new BytesRef(NumericUtils.BUF_SIZE_INT);
NumericUtils.intToPrefixCoded(value, 0, bytes);
Term term = new Term(field, bytes);

Then you can use this term for searching, or deleting/updating your index. 然后,您可以使用此术语进行搜索,或删除/更新索引。 In a first test, this worked fine for me. 在第一次测试中,这对我来说很好。 I can't tell if this is the "right" way to do things however. 我无法分辨这是否是“正确”的做事方式。 I've used the NumericRangeFilter before for filtering IntFields, but now I'm inclined to use this approach and use regular TermsFilter, or TermQueries instead. 我之前使用NumericRangeFilter来过滤IntFields,但现在我倾向于使用这种方法并使用常规的TermsFilter或TermQueries。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM