简体   繁体   中英

Searching and match count for phrase with Solr

I am using Solr to index documents and now I need to search those documents for an exact phrase and sort the results by the number of times this phrase appears on the document. I also have to present the number of times the phrase is matched back to the user.

I was using the following query (here I am searching by the word SAP):

{
    :params => {
            :wt => "json",
        :indent => "on",
          :rows => 100,
         :start => 0,
             :q => "((content:SAP) AND (doc_type:ClientContact) AND (environment:production))",
          :sort => "termfreq(content,SAP) desc",
            :fl => "id,termfreq(content,SAP)"
    }
}

Of course this is a representation of the actual query, that is done by transforming this hash into a query string at runtime.

I managed to get the search working by using content:"the query here" instead of content:the query here , but the hard part is returning and sorting by the termfreq .

Any ideas on how I could make this work?

Obs: I am using Ruby but this is a legacy application and I can't use any RubyGems, I am using the HTTP interface to Solr here.

I was able to make it work adding a ShingleFilter to my schema.xml :

In my case I started using SunSpot, so I just had to make the following change:

<!-- *** This fieldType is used by Sunspot! *** -->
<fieldType name="text" class="solr.TextField" omitNorms="false">
  <analyzer>
    <tokenizer class="solr.StandardTokenizerFactory"/>
    <filter class="solr.StandardFilterFactory"/>
    <filter class="solr.LowerCaseFilterFactory"/>
    <!-- This is the line I added -->
    <filter class="solr.ShingleFilterFactory" maxShingleSize="4" outputUnigrams="true"/>
  </analyzer>
</fieldType>

After doing that change, restarting Solr and reindexing, I was able to use termfreq(content, "the query here") both on my query ( q= ), on the returning fields ( fl= ) and even on sorting ( sort= ).

debug=results放在solr url的末尾,它也会为您提供短语freq。

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM