简体   繁体   English

Solr / Lucene:获取按索引中出现次数排序的所有字段名称

[英]Solr / Lucene: Get all field names sorted by number of occurrences in index

I want to get the list of all fields (ie field names) sorted by the number of times they occur in the Solr index, ie: most frequently occurring field, second most frequently occurring field and so on. 我想获取所有字段 (即字段名称)的列表,这些列表按它们在Solr索引中出现的次数进行排序,即:最频繁出现的字段,第二最频繁出现的字段,依此类推。

Alternatively, getting all fields in the index and the number of times they occur would also be sufficient. 另外,获取索引中的所有字段及其出现的次数也将足够。

How do I accomplish this either with a single solr query or through solr/lucene java API? 如何通过单个solr查询或通过solr / lucene Java API来完成此操作?

The set of fields is not fixed and ranges in the hundreds. 字段集不是固定的,范围为数百个。 Almost all fields are dynamic, except for id and perhaps a couple more. 几乎所有字段都是动态的,除了id以外,还有更多字段。

As stated in Solr: Retrieve field names from a solr index? Solr中所述:从solr索引中检索字段名称? you can do this by using the LukeRequesthandler. 您可以使用LukeRequesthandler做到这一点。

To do so you need to enable the requestHandler in your solrconfig.xml 为此,您需要在solrconfig.xml中 启用requestHandler

<requestHandler name="/admin/luke" class="org.apache.solr.handler.admin.LukeRequestHandler" />

and call it 叫它

http://solr:8983/solr/admin/luke?numTerms=0

If you want to get the fields sorted by something you are required to do this on your own. 如果要按某些字段对字段进行排序,则需要自己执行此操作。 I would suggest to use Solrj in case you are in a java environment. 我建议您在Java环境中使用Solrj。

Fetch fields using Solrj 使用Solrj获取字段

@Test
public void lukeRequest() throws SolrServerException, IOException {
  SolrServer solrServer = new HttpSolrServer("http://solr:8983/solr");

  LukeRequest lukeRequest = new LukeRequest();
  lukeRequest.setNumTerms(1);
  LukeResponse lukeResponse = lukeRequest.process(solrServer );

  List<FieldInfo> sorted = new ArrayList<FieldInfo>(lukeResponse.getFieldInfo().values());
  Collections.sort(sorted, new FieldInfoComparator());
  for (FieldInfo infoEntry : sorted) {
    System.out.println("name: " + infoEntry.getName());
    System.out.println("docs: " + infoEntry.getDocs());
  }
}

The comparator used in the example 示例中使用的比较器

public class FieldInfoComparator implements Comparator<FieldInfo> {
  @Override
  public int compare(FieldInfo fieldInfo1, FieldInfo fieldInfo2) {
    if (fieldInfo1.getDocs() > fieldInfo2.getDocs()) {
      return -1;
    }
    if (fieldInfo1.getDocs() < fieldInfo2.getDocs()) {
      return 1;
    }
    return fieldInfo1.getName().compareTo(fieldInfo2.getName());
  }
}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM