简体   繁体   中英

Solr search for exact phrase / substring

I am using solr for my work and it's excellent. However I am having trouble generating more elaborate search results.

I am searching for products by their title, brand, gender, and category (dress shoes, jackets, etc). Brands live in a "Brands" DB table, and the same for categories and genders. Products live in a "Products" DB table which is foreign-keyed to the Brands, Categories and Genders tables.

I am loading all of these into solr, and I can do a weighted ranked search accross them without trouble. This will give the most similar products, weighed by certain fields. What I would like is to do next is find exact matches from each field for any search string. For example:

SEARCH STRING: "Michael Kors Light Green Men's Dress Shoes"

SHOULD MATCH:

Brands:

  • Michael Kors

Colours:

  • Light Green
  • Green

Gender:

  • Mens

Category:

  • Dress Shoes
  • Shoes

I can then do a more restrictive - but categorised - intersect search. Eg all products that are [light green] AND [michael kors] AND [Dress Shoes OR Shoes]

Thanks :)

您可以尝试使用布尔查询布尔查询包含多个子句。

http://localhost:8983/solr/query?q=(Brands:"Michael Kors") AND (Colours:"Light Green") AND (Category:(Dress Shoes OR Shoes))

@mils More looking for Search results you should consider using a different query parser. I think this link is worth a read if any of the available query parser work for you. https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

You can change the schema fields from text to string. That would give you exact match - but at the expanse of having to handle upper/lower case by yourself.

The Dismax and Edismax parsers would give you the easiest option to search across several fields.

This is really a question about "text tagging" (also sometimes called " named entity recognition ").

In the context you're pursuing, Daniel Tunkelang considers this an essential part of "Query Understanding" .

Lucene has some data-structures which can be used to implement this sort of feature (see the OpenSextant project as an example), but Solr doesn't offer this feature (beyond approximate solutions using shingles as described above).

The reason that this is hard, is because you need document frequency information for each term/phrase in your query, across every field you care about, before you run your query! .


Slow, inelegant Solr solution:

If you're willing to run two queries, you can approximate your goal using facets:

  1. Run normal text string query Q1: requesting term facets on brand, colour, gender and category (stored as strings)
  2. Tokenize Q1 into 1 and 2-term shingles.
  3. Compare your Q1 query shingles with the top facet values returned for each field requested in the Q1 results.
  4. Whenever you see an exact match, apply your intersecting filter to a new query, Q2: the original query Q1 plus your new, restrictive criteria.
  5. Run Q2

(A nice side-effect here is that your query narrower will be able to see the total-count and facet counts returned from Q1 while constructing Q2, so you can decide to omit/relax certain restrictions should the number of matching results drop too low)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM