简体   繁体   中英

Solr query - Is there a way to limit the size of a text field in the response

Is there a way to limit the amount of text in a text field from a query? Here's a quick scenario....

I have 2 fields:

  • docId - int
  • text - string.

I will query the docId field and want to get a "preview" text from the text field of 200 chars. On average, the text field has anything from 600-2000 chars but I only need a preview.

eg. [mySolrCore]/select?q=docId:123&fl=text

Is there any way to do it since I don't see the point of bringing back the entire text field if I only need a small preview?

I'm not looking at hit highlighting since i'm not searching for specific text within the Text field but if there is similar functionaly of the hl.fragsize parameter it would be great!

Hope someone can point me in the right direction!

Cheers!

You would have to test the performance of this work-around versus just returning the entire field, but it might work for your situation. Basically, turn on highlighting on a field that won't match, and then use the alternate field to return the limited number of characters you want.

http://solr:8080/solr/select/?q=*:*&rows=10&fl=author,title&hl=true&hl.snippets=0&hl.fl=sku&hl.fragsize=0&hl.alternateField=description&hl.maxAlternateFieldLength=50

Notes:

  • Make sure your alternate field does not exist in the field list (fl) parameter
  • Make sure your highlighting field (hl.fl) does not actually contain the text you want to search

I find that the cpu cost of running the highlighter sometimes is more than the cpu cost and bandwidth of just returning the whole field. You'll have to experiment.

I decided to turn my comment into an answer.

I would suggest that you don't store your text data in Solr/Lucene. Only index the data for searching and store a unique ID or URL to identify the document. The contents of the document should be fetched from a separate storage system.

Solr/Lucene are optimized for searches. They aren't your data warehouse or database, and they shouldn't be used that way. When you store more data in Solr than necessary, you negatively impact your entire search system. You bloat the size of indices, increase replication time between masters and slaves, replicate data that you only need a single copy of, and waste cache memory on document caches that should be leveraged to make search faster.

So, I would suggest 2 things.

First, optimally, remove the text storage entire from your search index. Fetch the preview text and whole text from a secondary system that is optimized for holding documents, like a file server.

Second, sub-optimal, only store the preview text in your search index. Store the entire document elsewhere, like a file server.

My wish, which I suspect is shared by many sites, is to offer a snippet of text with each query response. That upgrades what the user sees from mere titles or equivalent. This is normal (see Google as an example) and productive technique. Presently we cannot easily cope with sending the entire content body from Solr/Lucene into a web presentation program and create the snippet there, together with many others in a set of responses as that is a significant network, CPU, and memory hog (think of dealing with many multi-MB files).

The sensible thing is for Solr/Lucene to have a control for sending only the first N bytes of content upon request, thereby saving a lot of trouble in the field. Kludges with hightlights and so forth are just that, and interfere with proper usage. We keep in mind that mechanisms feeding material into Solr/ucene may not be parsing the files, so those feeders can't create the snippets.

您可以添加一个额外的字段,例如excerpt / summary,它包含文本中的前200个字符,然后返回该字段

Linkedin real time search http://snaprojects.jira.com/browse/ZOIE

For storing big data http://project-voldemort.com/

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM