简体   繁体   English

如何使用 Solr 选择不同的字段值?

[英]How to select distinct field values using Solr?

I would like to do the equivalent of this SQL but with Solr as my data store.我想执行与此 SQL 相同的操作,但使用 Solr 作为我的数据存储。

SELECT
   DISTINCT txt
FROM
   my_table;

What syntax would force Solr to only give me distinct values?什么语法会强制 Solr 只给我不同的值?

http://localhost:8983/solr/select?q=txt:?????&fl=txt

EDIT: So faceted searching seems to fit, but as I investigated it, I realized I had only detailed half of the problem.编辑:所以分面搜索似乎很合适,但是当我调查它时,我意识到我只详细说明了问题的一半。

My SQL query should have read...我的 SQL 查询应该读...

SELECT
   DISTINCT SUBSTR(txt,0,3)
FROM
   my_table;

Any possibility of this with Solr? Solr 有这种可能性吗?

Faceting would get you a results set that contains distinct values for a field. Faceting会为您提供包含字段的不同值的结果集。

Eg 例如

http://localhost:8983/solr/select/?q=*%3A*&rows=0&facet=on&facet.field=txt

You should get something back like this: 你应该得到这样的东西:

<response>
<responseHeader><status>0</status><QTime>2</QTime></responseHeader>
<result numFound="4" start="0"/>
<lst name="facet_counts">
 <lst name="facet_queries"/>
 <lst name="facet_fields">
  <lst name="txt">
        <int name="value">100</int>
        <int name="value1">80</int>
        <int name="value2">5</int>
        <int name="value3">2</int>
        <int name="value4">1</int>
  </lst>
 </lst>
</lst>
</response>

Check out the wiki for more information. 查看Wiki以获取更多信息。 Faceting is a really cool part of solr. Faceting是solr非常酷的一部分。 Enjoy :) 请享用 :)

http://wiki.apache.org/solr/SimpleFacetParameters#Facet_Fields http://wiki.apache.org/solr/SimpleFacetParameters#Facet_Fields

Note: Faceting will show the indexed value, Ie after all the filters have been applied. 注意:Faceting将显示索引值,即应用所有过滤器后的值。 One way to get around this is to use the copyfield method, so that you can create a facet version of the txt field. 解决此问题的一种方法是使用copyfield方法,以便您可以创建txt字段的facet版本。 THis way your results will show the original value. 这样,您的结果将显示原始值。

Hope that helps.. Lots of documentation on faceting available on the wiki. 希望有所帮助..关于维基上可用的分面的大量文档。 Or I did write some with screen shots.. which you can check out here: 或者我写了一些屏幕截图..你可以在这里查看:

http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html http://www.craftyfella.com/2010/01/faceting-and-multifaceting-syntax-in.html

For the DISTINCT part of your question, I think you may be looking for Solr's field collapsing / grouping functions . 对于你问题的DISTINCT部分,我想你可能正在寻找Solr的字段折叠/分组功能 It will enable you to specify a field you want unique results from, create a group on those unique values and it will show you how many documents are that group. 它使您能够指定希望获得唯一结果的字段,在这些唯一值上创建一个组,它将显示该组的文档数量。

You can then use the same substr stored in a separate field, and collapse on that. 然后,您可以使用存储在单独字段中的相同substr ,并对其进行折叠。

Use the StatsComponent with parameter stats.calcdistinct to get a list of distinct values for a certain field: 将StatsComponent与参数stats.calcdistinct一起使用可获取特定字段的不同值列表:

Solr 7 https://lucene.apache.org/solr/guide/7_7/the-stats-component.html Solr 7 https://lucene.apache.org/solr/guide/7_7/the-stats-component.html

Solr 6 https://cwiki.apache.org/confluence/display/solr/The+Stats+Component Solr 6 https://cwiki.apache.org/confluence/display/solr/The+Stats+Component

It will also give you the count of distinct values. 它还会为您提供不同值的计数。 stats.calcdistinct is probably available since 4.7. stats.calcdistinct可能从4.7开始提供。

http://wiki.apache.org/solr/StatsComponent is outdated as it does not cover stats.calcdistinct http://wiki.apache.org/solr/StatsComponent已过时,因为它不包括stats.calcdistinct

Example

/select?stats=on&stats.field=region&rows=0&stats.calcdistinct=true

"stats":{
  "stats_fields":{
    "region":{
      "min":"GB",
      "max":"GB",
      "count":20276,
      "missing":0,
      "distinctValues":["GB"],
      "countDistinct":1}}}}

Difference to Facets 与方面的差异

In case of facets you need to know the count to request all, or you set the facet.limit to something really high and count the result yourself. 如果是face,你需要知道要求所有的计数,或者你将facet.limit设置为非常高的值并自己计算结果。 Also, you need a string field for making facets work the way you need it here. 此外,您需要一个字符串字段来使facet以您需要的方式工作。

I would store the substring in a different field (let's call in txt_substring ), then facet on txt_substring as CraftyFella showed. 我将子字符串存储在另一个字段中(让我们在txt_substring调用),然后在CraftyFella显示的txt_substring进行facet。

Normally I'd use the n-gram tokenizer , but I don't think you can facet on that. 通常我会使用n-gram标记器 ,但我不认为你可以参与其中。

看一下分面搜索

Solr 5.1 and later has the new Facet Module that has integrated support for finding the number of unique values in a field. Solr 5.1及更高版本具有新的Facet模块,该模块集成了对查找字段中唯一值数量的支持。 You can even find the number of unique values in a field for each bucket of a facet, and sort by that value to find the highest or lowest number of unique values. 您甚至可以在方面的每个存储桶的字段中找到唯一值的数量,并按该值排序以查找最高或最低数量的唯一值。

Number of unique values in "myfield": json.facet={x:'unique(myfield)'} “myfield”中的唯一值数:json.facet = {x:'unique(myfield)'}

Facet by "category" field, and for each category, show the number of unique values in "color": 由“类别”字段构成,对于每个类别,以“颜色”显示唯一值的数量:

json.facet={
  cat_breakdown : { terms : {  // group results by unique values of "category"
    field : category,
    facet : {
      x : "unique(color)",  // for each category, find the number of unique colors
      y : "avg(price)"      // for each category, find the average price
    }
  }}
}

This is in Solr 5.1 and later. 这是在Solr 5.1及更高版本中。 More facet functions like "unique" are shown at http://yonik.com/solr-facet-functions/ http://yonik.com/solr-facet-functions/上显示了更多方面的功能,如“独特”

Best way to find the number of unique values in "myfield", using the JSON API : 使用JSON API查找“myfield”中唯一值的数量的最佳方法:

http://YourCollectionAddress/select?json
={query:'\*:\*',limit:0,facet:{distinctCount:'unique(myfield)'}}

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM