简体   繁体   English

Solr3.2 Carrot2 只聚类“其他主题”

[英]Solr3.2 Carrot2 Clustering nothing but “Other Topics”

it is said that the Carrot integration into Solr was improved since the release of Solr 3.2 but it turns out to be different for me.据说自从 Solr 3.2 发布以来,Carrot 与 Solr 的集成得到了改进,但对我来说却有所不同。 I had a absolutly same configurated Solr 1.4.1 Server running were Carrot was working great and Solr 3.2 just gives me nothing but "other topics".我有一个配置完全相同的 Solr 1.4.1 服务器运行,Carrot 运行良好,Solr 3.2 只给我“其他主题”。 This ist driving me crazy because beside I get no exceptions or anything unusual.这让我发疯,因为除了我没有例外或任何不寻常的事情。 Even the result xml looks the same...甚至结果 xml 看起来都一样......

However I didn't make many changes to the standard configuration of the clustering component:不过我并没有对集群组件的标准配置做太多改动:

 <searchComponent name="clustering" 
                   enable="${solr.clustering.enabled:true}"
                   class="solr.clustering.ClusteringComponent" >
    <lst name="engine">
      <str name="name">default</str>

      <str name="carrot.algorithm">org.carrot2.clustering.lingo.LingoClusteringAlgorithm</str>

      <str name="LingoClusteringAlgorithm.desiredClusterCountBase">20</str>
          <!--custom-->
      <str name="LingoClusteringAlgorithm.phraseLabelBoost">8.00</str>
      <str name="TermDocumentMatrixBuilder.titleWordsBoost">6.00</str>


      <str name="carrot.lexicalResourcesDir">clustering/carrot2</str>

      <str name="MultilingualClustering.defaultLanguage">ENGLISH</str>
    </lst>
    <lst name="engine">
      <str name="name">stc</str>
      <str name="carrot.algorithm">org.carrot2.clustering.stc.STCClusteringAlgorithm</str>
    </lst>
  </searchComponent>
  <requestHandler name="/clustering"
                  startup="lazy"
                  enable="${solr.clustering.enabled:true}"
                  class="solr.SearchHandler">
    <lst name="defaults">
      <bool name="clustering">true</bool>
      <str name="clustering.engine">default</str>
      <bool name="clustering.results">true</bool>
       <str name="carrot.title">autocomplete</str>
       <str name="carrot.url">autocomplete</str>
       <str name="carrot.snippet">autocomplete</str>
       <bool name="carrot.outputSubClusters">true</bool>

       <str name="defType">edismax</str>
       <str name="qf">
          text^0.5 autocomplete^1.2 ata^1.0 raum^1.0 system^1.0 assy^1.0 unit^1.0
       </str>
       <str name="q.alt">*:*</str>
       <str name="rows">10</str>
       <str name="fl">*,score</str>
    </lst>     
    <arr name="last-components">
      <str>clustering</str>
    </arr>
  </requestHandler>

My best guess was that carrot is not woking properly together with edismax (which wasn't implemented in Solr 1.4.1) but that might be missleading.我最好的猜测是胡萝卜不能与 edismax 一起正常工作(在 Solr 1.4.1 中没有实现),但这可能会产生误导。

I allready reindexed my data just to make sure that this is not the issue.我已经重新索引了我的数据,以确保这不是问题。

In the carrot workbench clustering is working well with Lingo as the algorithm.在胡萝卜工作台中,聚类与 Lingo 作为算法运行良好。 when I chose "by source" I get "other topics" as in the xml.当我选择“按来源”时,我会得到 xml 中的“其他主题”。 Might Lingo be not configured well? Lingo可能没有配置好? Do have to configure anything besides solrconfig.xml to fix that?除了 solrconfig.xml 之外,是否必须配置任何东西来解决这个问题?

I'm thankful for any help.我很感激任何帮助。

This happens if the 'snippet' you are trying to cluster on never differs or differs very little.如果您尝试群集的“代码段”从未不同或差异很小,则会发生这种情况。 Try adding 'clustering.snippet=' to your request parameters.尝试将“clustering.snippet=”添加到您的请求参数中。 In your settings there is a field called 'autocomplete' that it defaults to.在您的设置中,有一个默认为“自动完成”的字段。 Does this field have any meaningful text?该字段是否有任何有意义的文本?

Example that makes this behaviour go away for me:使这种行为 go 远离我的示例:

http://localhost:8983/solr/clustering?q=peter&rows=200&carrot.snippet=summary http://localhost:8983/solr/clustering?q=peter&rows=200&carrot.snippet=summary

Best regards,此致,

/Peter W /彼得·W

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM