简体   繁体   中英

Solr query (q) or filter query (fq)

I have a ~1 mil product document Solr index. I also have a whole bunch of UI filters such as, categories, tabs, price ranges, sizes, colors, and some other filters.

Is it the right way to have the q selecting everything (q=\*:\*) while all other filters in the fq? example:

fq=(catid:90 OR catid:81) AND priceEng:[38 TO 40] AND (size:39 OR size:40 OR size:41 OR size:50 OR size:72) AND (colorGroup:Yellow OR colorGroup:Violet OR colorGroup:Orange... AND (companyId:81 OR companyId:691 OR companyId:671 OR companyId:628 OR companyId:185 OR companyId:602 OR... AND endShipDays:[* TO 7])

To me, everything from categories to companyIds, from colors and sizes, etc are just filters. Any problem in performance in the future growth with this approach? Should I put some of the queries in the q, which ones?

Thank you,

It's preferable to use Filter Query over normal Query wherever possible.

FilterQuery is able to take advantage of the FilterCache , which would be a huge performance boost in comparison to your queries.

I would look at the following points about a field to in order to decide:

  1. Does your field have a fixed boost score or do you need scoring for this field at all? If yes, put it in query, because as mentioned above, filter query does not use scores.
  2. Is condition for this field used frequently? If yes - again, as said before, filter cache may give huge advantage, but if no - it may be even slower.
  3. Is your index constant? This is kinda similar to #2. If your index is being updated frequently, usage of filter queries may become a bottleneck instead of giving performance boost.

Some notes about #3: In my experience I had a big index which was populated with new docs every few seconds and autoSoftCommit was set to few seconds as well. During soft commits new searcher was opened which was invalidating caches. So what was really happening, filter hit ratio was almost always 0. I can tell more: I've figured out that first filter query run is more expensive than run of a query with all those filter conditions moved to "q" instead of "fq". For example, my query took 1 second with 5 filter queries (no cache hit) and 147ms when I moved all "fq" conditions into the main query with "AND". But of course, when I stopped index updates, the same filter queries took 0ms because cache was used. So this is something to consider.

Also few other points for your question:

  • Try to never use wildcards in your query. It significantly affects performance. Therefore instead of " : " I would suggest using one condition which is less-constant-per-request (most-constant-per-request which don't need score you want to put to "fq")
  • Range searches also better to be avoided (if possible). And range searches with wildcards even more. It's about your "endShipDays:[* TO 7]". For example, using "endShipDays:(1 2 3 4 5 6 7)" would be more effective, but it's just an example, there are many ways.

Hope it helps.

The way I use q and fq . I apply full-text search on q and all the filters on fq . Lets say you have field keyword that your going to have full-text search with fields as defined in your schema with copyField

<copyField source="id" dest="keyword"/>
<copyField source="category" dest="keyword"/>
<copyField source="product_name" dest="keyword"/>
<copyField source="color" dest="keyword"/>
<copyField source="location" dest="keyword"/>
<copyField source="price" dest="keyword"/>
<copyField source="title" dest="keyword"/>
<copyField source="description" dest="keyword"/>

My query would look like

/select?q={keyword}&fq=category:fashion&fq=location:nyc

/select?q=jeans&fq=category:fashion&fq=location:nyc

As digitaljoel suggested, if you have to query multiple fields, then it would be better to use multiple fq's (Refer to above query) instead of using AND and OR with q

Note: in my case q default refers to field keyword as defined in solrconfig.xml

<requestHandler name="/select" class="solr.SearchHandler">
<!-- default values for query parameters can be specified, these
     will be overridden by parameters in the request
  -->
 <lst name="defaults">
   <str name="echoParams">explicit</str>
   <int name="rows">10</int>
   <str name="df">keyword</str>
 </lst>

Think about your query and put everything that doesn't have to be scored and is repeatable in the fq parameter. That way consecutive queries that will hit the Solr node between opening the searcher will be able to reuse the information stored in the FilterCache.

Filter cache stores unique filters as the key in the filter - the value is an array of bits where each entry of the array says if a given document matches the given filter or not. That way it is very easy to re-apply the filter for the next query. But you, of course, miss the scoring capabilities.

When looking at your query I would simplify it a bit, by using multiple fq values, something among those lines:

fq=(catid:90 OR catid:81)
fq=priceEng:[38 TO 40]
fq=(size:39 OR size:40 OR size:41 OR size:50 OR size:72)
fq=(colorGroup:Yellow OR colorGroup:Violet OR colorGroup:Orange  ... ) 
fq=(companyId:81 OR companyId:691 OR companyId:671 OR companyId:628 OR companyId:185 OR companyId:602 OR ... ) 
fq=endShipDays:[* TO 7])

Filters are additive, so the query would return the same results, but at least to me it is easier to manage:)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM