簡體   English   中英

Solr Collat​​ion如何工作?

[英]How does Solr Collation work

我已經按照Solr文檔中的拼寫檢查示例進行了操作。

我用過的配置:

<!-- a spellchecker built from a field of the main index -->
<lst name="spellchecker">
  <str name="name">default</str>
  <str name="field">name_spell</str>
  <str name="classname">solr.DirectSolrSpellChecker</str>
  <!-- the spellcheck distance measure used, the default is the internal levenshtein -->
  <str name="distanceMeasure">internal</str>
  <!-- minimum accuracy needed to be considered a valid spellcheck suggestion -->
  <float name="accuracy">0.5</float>
  <!-- the maximum #edits we consider when enumerating terms: can be 1 or 2 -->
  <int name="maxEdits">2</int>
  <!-- the minimum shared prefix when enumerating terms -->
  <int name="minPrefix">1</int>
  <!-- maximum number of inspections per result. -->
  <int name="maxInspections">5</int>
  <!-- minimum length of a query term to be considered for correction -->
  <int name="minQueryLength">4</int>
  <!-- maximum threshold of documents a query term can appear to be considered for correction -->
  <float name="maxQueryFrequency">0.01</float>
  <!-- uncomment this to require suggestions to occur in 1% of the documents -->
    <!-- <float name="thresholdTokenFrequency">.01</float> -->

</lst>
<lst name="spellchecker">
  <str name="name">wordbreak</str>
  <str name="classname">solr.WordBreakSolrSpellChecker</str>      
  <str name="field">name_spell</str>
  <str name="combineWords">true</str>
  <str name="breakWords">true</str>
  <int name="maxChanges">10</int>
</lst>
</searchComponent>

處理器:

  <requestHandler name="/spell" class="solr.SearchHandler" startup="lazy">
    <lst name="defaults">
      <str name="spellcheck.dictionary">default</str>
      <str name="spellcheck.dictionary">wordbreak</str>
      <str name="spellcheck">on</str>
      <str name="spellcheck.extendedResults">true</str>       
      <str name="spellcheck.count">10</str>
      <str name="spellcheck.alternativeTermCount">5</str>
      <str name="spellcheck.maxResultsForSuggest">5</str>       
      <str name="spellcheck.collate">true</str>
      <str name="spellcheck.collateExtendedResults">true</str>  
      <str name="spellcheck.maxCollationTries">10</str>
      <str name="spellcheck.maxCollations">5</str>         
    </lst>
    <arr name="last-components">
      <str>spellcheck_new</str>
    </arr>
  </requestHandler>

架構字段:

    <field name="attribute_key" type="text" indexed="true" stored="true" multiValued="false" />
    <field name="spell_check_field" type="text_spell" indexed="true" stored="false" multiValued="true"/>
    <copyField source="attribute_key" dest="spell_check_field" />
    <field name="name_spell" type="text_general" indexed="true" stored="false" multiValued="false"/>
    <copyField source="attribute_key" dest="name_spell" />
    <field name="attribute_key_tag" type="tag" stored="false" omitTermFreqAndPositions="true" omitNorms="true" multiValued="true"/>
    <copyField source="attribute_key" dest="attribute_key_tag" multiValued="true"/>
    <field name="attribute_value" type="string" indexed="false" stored="true" multiValued="false" />
    <defaultSearchField>attribute_key</defaultSearchField>

我看到這些建議完美無缺。 但是對於所有查詢,collat​​ions數組總是為空。

Ex查詢:

http://localhost:8984/solr/spell_check/spell?spellcheck.q=nike%20shoes&spellcheck=true&spellcheck.collate=true&wt=json&spellcheck=true&spellcheck.extendedResults=true&spellcheck.collate=true

結果:

{
"responseHeader": {
"zkConnected": true,
"status": 0,
"QTime": 60
},
"response": {
"numFound": 0,
"start": 0,
"docs": []
},
"spellcheck": {
"suggestions": [
"nike",
{
"numFound": 6,
"startOffset": 0,
"endOffset": 4,
"origFreq": 2,
"suggestion": [
{
"word": "n i k e",
"freq": 19
},
{
"word": "nine",
"freq": 1
},
{
"word": "none",
"freq": 29
},
{
"word": "note",
"freq": 5
},
{
"word": "nicka",
"freq": 2
},
{
"word": "nino",
"freq": 2
}
]
},
"shoes",
{
"numFound": 10,
"startOffset": 5,
"endOffset": 10,
"origFreq": 0,
"suggestion": [
{
"word": "shoe",
"freq": 30
},
{
"word": "shoe s",
"freq": 30
},
{
"word": "short",
"freq": 30
},
{
"word": "s h o e s",
"freq": 4
},
{
"word": "sheer",
"freq": 15
},
{
"word": "sheen",
"freq": 4
},
{
"word": "sheet",
"freq": 3
},
{
"word": "shower",
"freq": 2
},
{
"word": "shock",
"freq": 1
},
{
"word": "shred",
"freq": 1
}
]
}
],
"correctlySpelled": false,
"collations": []
}
}

如何設置排序規則?

我們先來看看SpellCheck Collat​​e文檔中的定義

使Solr根據提交的查詢中每個術語的最佳建議構建新查詢。

長話短說,當你指定spellcheck.collat​​e = true時,你要求Solr推薦一個你可以重新執行的新查詢,並且比你收到的建議的組合更好。 讓我向您展示幾個例子。

  • 假設您要搜索

初次審計

  • 無論出於何種原因,它被輸入為

initila audti

  • 如果整理錯誤,您將收到以下拼寫檢查建議

    <lst name="suggestions">
        <lst name="initila">
            <int name="numFound">5</int>
            <int name="startOffset">1</int>
            <int name="endOffset">8</int>
            <arr name="suggestion">
                <str>initial</str>
                <str>initi la</str>
                <str>initiala</str>
                <str>ini tila</str>
                <str>initilal</str>
            </arr>
        </lst>
        <lst name="audt">
            <int name="numFound">4</int>
            <int name="startOffset">9</int>
            <int name="endOffset">13</int>
            <arr name="suggestion">
                <str>aud t</str>
                <str>audit</str>
                <str>au dt</str>
                <str>audi</str>
            </arr>
        </lst>
    </lst>

這意味着每個單詞會有幾個推薦

  • 但是如果你打開校對,你最有可能 - 如果有的話 - 建議應該執行什么查詢。 雖然不能保證它是最好的,但可以認為這是一個可以幫助你的好猜測

     <lst name="suggestions"> <lst name="initila"> <int name="numFound">5</int> <int name="startOffset">1</int> <int name="endOffset">8</int> <arr name="suggestion"> <str>initial</str> <str>initi la</str> <str>initiala</str> <str>ini tila</str> <str>initilal</str> </arr> </lst> <lst name="audti"> <int name="numFound">5</int> <int name="startOffset">9</int> <int name="endOffset">14</int> <arr name="suggestion"> <str>audit</str> <str>audt i</str> <str>auditi</str> <str>au dti</str> <str>audtis</str> </arr> </lst> <lst name="collation"> <str name="collationQuery">initial audit</str> <int name="hits">1983</int> <lst name="misspellingsAndCorrections"> <str name="initila">initial</str> <str name="audti">audit</str> </lst> </lst> </lst> 

這將是推薦的查詢

初次審計

這是從這里獲得的

<str name="collationQuery">initial audit</str>

只有在索引中有推薦的查詢能夠滿足您的要求時,才能使用排序規則

以下方法解決了我的問題:

  1. requestHandler添加默認字段作為defaults列表的子項,即<str name="df">name_spell</str> 現在執行查詢會得到collations結果。 這里可以使用qspellcheck.q任何一個。

要么

  1. 使用q而不是spellcheck.q並且在使用q指定字段,即使用q=name_spell:(nike%20shoes)而不是spellcheck.q=nike%20shoes ,它將給出collations結果。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM