简体   繁体   中英

ElasticSearch query a specific term, not other terms

When I query for a term (standard-analyzer), I get a list of results sorted on score. Which is good. But when calling:

QueryBuilders.termQuery(fieldname, word);

I get a mixture of:

word
some word
WORD
word and such

In no particular ordering, since all score the same, because they all contain word . Since the number of results vary between 0 and towards 1M, I need to most exact matches first (or the others filtered).
I tried adding based on ES regex filter , but looks like they are not being processed:

FilterBuilders.regexQuery(fieldname, "~"+word).flag(RegexpFlag.ALL);
FilterBuilders.regexQuery(fieldname, "^((?!" + word+").)*$".flag(RegexpFlag.ALL);// and this
FilterBuilders.regexQuery(fieldname, "^\\(\\(\\?!" + word+"\\)\\.\\)*$".flag(RegexpFlag.ALL);// or

I've also tried the QueryBuilders.boostingQuery which I also seem to fail in - besides I came across some comments that the negative querying does not work.

So basically, I'm looking for a query that queries for a particular term, while filtering/negative boosting the results that contains other words.
If possible I'd what to stay away from scripting for now (bad experiences).

So query: Must/should not contain a word different from word

In fact the most easy set of queries is:

final int fetchAmount = 100; // number of items to return
final FilterBuilder filterBuilder = FilterBuilders.termFilter(fieldname, word);
final QueryBuilder combinedQuery = QueryBuilders.termQuery(fieldname, word);
final QueryBuilder queryBuilder = QueryBuilders.filteredQuery(combinedQuery, filterBuilder);
final SearchResponse builder = CLIENT.prepareSearch(index_name).setQuery(queryBuilder).setExplain(true)
        .setTypes(type_name).setSize(fetchAmount).setSearchType(SearchType.QUERY_THEN_FETCH).execute().actionGet();

Using the FilterBuilder to, cheaply, discard the values that don't contain word . Use the same query ( TermQuery ) for the QueryBuilder will result in a scoring mechanism. Take the score SearchHit.score() from the first, then continue until one is found for which the score < firstScore .
The problem, as described in question, occurs when instead of using TermQuery for QueryBuilder QueryBuilders.matchAllQuery() is used. The same set of results will be returned in the latter case, but no scoring (hence no sorting) mechanism is applied.

Keep the setSize relatively low, for speed purposes, when the last item is still of interest, call the above query again, but then add setFrom(fetchAmount ) so that the second query will start where the first one stopped, like:

final int xthQueryCalledTime = 1; // if using a loop
final SearchResponse builder = CLIENT.prepareSearch(index_name).setQuery(queryBuilder).setExplain(true)
        .setTypes(type_name).setSize(fetchAmount).setSearchType(SearchType.QUERY_THEN_FETCH).setFrom(fetchAmount * xthQueryCalledTime).execute().actionGet();

Do until done.

Ps. Don't using scroll! This will mix-up the score ordering. From JavaDoc on SearchType.SCAN:

Performs scanning of the results which executes the search without any sorting. It will automatically start scrolling the result set

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM