When I query for a term (standard-analyzer), I get a list of results sorted on score. Which is good. But when calling:
QueryBuilders.termQuery(fieldname, word);
I get a mixture of:
word
some word
WORD
word and such
In no particular ordering, since all score the same, because they all contain word
. Since the number of results vary between 0 and towards 1M, I need to most exact matches first (or the others filtered).
I tried adding based on ES regex filter , but looks like they are not being processed:
FilterBuilders.regexQuery(fieldname, "~"+word).flag(RegexpFlag.ALL);
FilterBuilders.regexQuery(fieldname, "^((?!" + word+").)*$".flag(RegexpFlag.ALL);// and this
FilterBuilders.regexQuery(fieldname, "^\\(\\(\\?!" + word+"\\)\\.\\)*$".flag(RegexpFlag.ALL);// or
I've also tried the QueryBuilders.boostingQuery
which I also seem to fail in - besides I came across some comments that the negative querying does not work.
So basically, I'm looking for a query that queries for a particular term, while filtering/negative boosting the results that contains other words.
If possible I'd what to stay away from scripting for now (bad experiences).
So query: Must/should not contain a word different from word
In fact the most easy set of queries is:
final int fetchAmount = 100; // number of items to return
final FilterBuilder filterBuilder = FilterBuilders.termFilter(fieldname, word);
final QueryBuilder combinedQuery = QueryBuilders.termQuery(fieldname, word);
final QueryBuilder queryBuilder = QueryBuilders.filteredQuery(combinedQuery, filterBuilder);
final SearchResponse builder = CLIENT.prepareSearch(index_name).setQuery(queryBuilder).setExplain(true)
.setTypes(type_name).setSize(fetchAmount).setSearchType(SearchType.QUERY_THEN_FETCH).execute().actionGet();
Using the FilterBuilder
to, cheaply, discard the values that don't contain word
. Use the same query ( TermQuery
) for the QueryBuilder
will result in a scoring mechanism. Take the score SearchHit.score()
from the first, then continue until one is found for which the score < firstScore
.
The problem, as described in question, occurs when instead of using TermQuery
for QueryBuilder
QueryBuilders.matchAllQuery()
is used. The same set of results will be returned in the latter case, but no scoring (hence no sorting) mechanism is applied.
Keep the setSize
relatively low, for speed purposes, when the last item is still of interest, call the above query again, but then add setFrom(fetchAmount )
so that the second query will start where the first one stopped, like:
final int xthQueryCalledTime = 1; // if using a loop
final SearchResponse builder = CLIENT.prepareSearch(index_name).setQuery(queryBuilder).setExplain(true)
.setTypes(type_name).setSize(fetchAmount).setSearchType(SearchType.QUERY_THEN_FETCH).setFrom(fetchAmount * xthQueryCalledTime).execute().actionGet();
Do until done.
Ps. Don't using scroll! This will mix-up the score ordering. From JavaDoc on SearchType.SCAN:
Performs scanning of the results which executes the search without any sorting. It will automatically start scrolling the result set
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.