简体   繁体   中英

SOLR search by external fields

What we need is similar to what is discussed here, except not as a filter but as an actual query: http://lucene.472066.n3.nabble.com/filter-query-from-external-list-of-Solr-unique-IDs-td1709060.html

We'd like to implement a query parser/scorer that would allow us to combine SOLR searches with searching external fields. This is due to the limitation of having to update an entire document even though only a field in the document needs to be updated.

For example we have a database table called document_attributes containing two columns document_id, attribute_id. The document_id corresponds to the ID of the documents indexed is SOLR.

We'd like to be able to pass in a query like:

attribute_id:123 OR text:some_query (attribute_id:123 OR attribute_id:456) AND text:some_query etc...

Can we implement a plugin/module in SOLR that's able to parse the above query and then fetch the document_ids associated with the attribute_id and combine the results with the normal processing of SOLR search to return one set of results for the entire query.

We'd appreciate any guidance on how to implement this if it is possible.

I would repeat the advice offered by the referenced question, with a qualification.

For Solr < 4.0 the two approaches to consider are:

  • Doing the DocumentID lookup before querying solr, and querying solr with a list of document ids (eg fq=(docid:1 OR docid:5) )

  • Creating your own derived SolrQueryParser which performs the database query to substitute document ids for attribute ids (eq fq=attribute:1 is expanded by the queryparser to fq=(docid:1 OR docid:5) )

The decision should revolve around the numer of document ids you'll be sending to Solr. For small, or event moderate (let's say hundreds) of document IDs then sending the IDs as a filterQuery is likely the best way to go. If you're potentially sending a large or very large number of documentids then extending a queryparser for your case is a fair strategy. If you extend a query parser you may want to consider running it on a dedicated (non-default) request-handler, and building in aspects such as caching to ensure your results remain hihgly performant.

For Solr 4.0 and above you might also consider using a cross-core join . You could have your existing Solr core remain as-is, and create a new core which indexes the document:attribute relationships. This should alleviate your concerns about the whole document update, and allow you to execute your whole query in Solr, in-memory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM