简体   繁体   中英

Java Lucene search - is it possible to search a number in a range?

Using the Lucene libs, I need to make some changes to the existing search function: Let's assume the following object:

Name: "Port Object 1"

Data: "TCP (1)/1000-2000"

And the query (or the search text) is "1142" Is it possible to search for "1142" inside Data field and find the Port Object 1, since it refers to a range between 1000-2000?

I only managed to find the numeric range query, but that does not apply in this case, since I dont know the ranges...

package com.company;

import java.io.IOException;

import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.document.StringField;
import org.apache.lucene.document.TextField;
import org.apache.lucene.index.DirectoryReader;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.index.IndexWriterConfig;
import org.apache.lucene.queryparser.classic.ParseException;
import org.apache.lucene.queryparser.classic.QueryParser;
import org.apache.lucene.search.IndexSearcher;
import org.apache.lucene.search.Query;
import org.apache.lucene.search.ScoreDoc;
import org.apache.lucene.search.TopDocs;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.RAMDirectory;

public class Main {
    public static void main(String[] args) throws IOException, ParseException {
        StandardAnalyzer analyzer = new StandardAnalyzer();

        // 1. create the index
        Directory index = new RAMDirectory();

        IndexWriterConfig config = new IndexWriterConfig(analyzer);

        IndexWriter w = new IndexWriter(index, config);
        addDoc(w, "TCP (6)/1100-2000", "193398817");
        addDoc(w, "TCP (6)/3000-4200", "55320055Z");
        addDoc(w, "UDP (12)/50000-65000", "55063554A");
        w.close();

        // 2. query
        String querystr = "1200";

        Query q = new QueryParser("title", analyzer).parse(querystr);

        // 3. search
        int hitsPerPage = 10;
        IndexReader reader = DirectoryReader.open(index);
        IndexSearcher searcher = new IndexSearcher(reader);
        TopDocs docs = searcher.search(q, hitsPerPage);
        ScoreDoc[] hits = docs.scoreDocs;

        // 4. display results
        System.out.println("Found " + hits.length + " hits.");
        for(int i=0;i<hits.length;++i) {
            int docId = hits[i].doc;
            Document d = searcher.doc(docId);
            System.out.println((i + 1) + ". " + d.get("isbn") + "\t" + d.get("title"));
        }

        reader.close();
    }

    private static void addDoc(IndexWriter w, String title, String isbn) throws IOException {
        Document doc = new Document();
        doc.add(new TextField("title", title, Field.Store.YES));

        doc.add(new StringField("isbn", isbn, Field.Store.YES));
        w.addDocument(doc);
    }
}

Refer to above code. The query "1200" should find the first doc.

LE:

I think what I need is exactly the opposite of range search: https://lucene.apache.org/core/5_5_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html#Range_Searches

Here is one approach, but it requires you to parse the range data into separate values, before your data can be indexed by Lucene. So, for example, from this:

"TCP (6)/1100-2000"

You would need to extract these two values (eg using a regex): 1100 and 2000 .

LongRange with ContainsQuery

Add a new field to each document (eg named "tcpRange") and define it as a LongRange field.

(There is also IntRange if you don't need long values.)

long[] min = { 1100 };
long[] max = { 2000 };
Field tcpRange = new LongRange("tcpRange", min, max);

The values are defined in arrays, because this range type can handle multiple ranges in one field. But we only need the one range in our case.

Then you can make use of the " contains " query to search for your specific value, eg 1200 :

long[] searchValue = { 1200 };
Query containsQuery = LongRange.newContainsQuery("tcpRange", searchValue, searchValue);

Note: My examples are based on the latest version of Lucene (8.5). I believe this should apply to other earlier versions also.

EDIT

Regarding additional questions asked in the comments to this answer...

The following method converts an IPv4 address to a long value. Using this allows IP address ranges to be handled (and the same LongRange approach as above can be used):

public long ipToLong(String ipAddress) {
    long result = 0;
    String[] ipAddressInArray = ipAddress.split("\\.");
    for (int i = 3; i >= 0; i--) {
        long ip = Long.parseLong(ipAddressInArray[3 - i]);
        // left shifting 24, 16, 8, 0 with bitwise OR
        result |= ip << (i * 8);
    }
    return result;
}

This also means valid subnet ranges to not have to be handled - any two IP adresses will generate a sequential set of numbers.

Credit to this mkyong site for the approach.

I managed to add another field, and it works now. Also, do you know how I could do the same search but for IPv4? if I search something like "192.168.0.100" in a "192.168.0.1-192.168.0.255" string?

Hi @CristianNicolaePerjescu I can't comment because my reputation, but you can create a class that extends Field and add this in your lucene index. For example:

public class InetAddressRange extends Field {
  ...

  /**
   * Create a new InetAddressRange from min/max value
   * @param name field name. must not be null.
   * @param min range min value; defined as an {@code InetAddress}
   * @param max range max value; defined as an {@code InetAddress}
   */
  public InetAddressRange(String name, final InetAddress min, final InetAddress max) {
    super(name, TYPE);
    setRangeValues(min, max);
  }

  ...

}

And then add to the index:

document.add(new InetAddressRange("field", InetAddressFrom, InetAddressTo));

In your class you can add your own Query format, like:

  public static Query newIntersectsQuery(String field, final InetAddress min, final InetAddress max) {
    return newRelationQuery(field, min, max, QueryType.INTERSECTS);
  }

  /** helper method for creating the desired relational query */
  private static Query newRelationQuery(String field, final InetAddress min, final InetAddress max, QueryType relation) {
    return new RangeFieldQuery(field, encode(min, max), 1, relation) {
      @Override
      protected String toString(byte[] ranges, int dimension) {
        return InetAddressRange.toString(ranges, dimension);
      }
    };
  }

I hope this is helpful for you.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM