简体   繁体   English

通过文本字段聚合和排序,并在Elasticsearch中串联其他文本字段

[英]Aggregate and Sort by a text field and concatenate other text fields in Elasticsearch

In Elasticsearch, how does one aggregate and sort by a text field and concatenate field values of other text fields, joined by eg ; 在Elasticsearch中,如何通过一个文本字段进行聚合和排序 ,以及如何将其他文本字段的字段值连接起来,例如, ; ?

In concatenating I mean to concatenate values of the same field from all the aggregated documents, not values of different fields from the same document. 串联时,我的意思是串联所有汇总文档中同一字段的值, 而不是同一文档中不同字段的值。

Details 细节

I have small documents with fields gene, tag, annotation described as 我有一些带有字段基因,标签,注释的小文档,描述为

{
  "mappings": {
    "annotations": {
      "properties": {
        "species": {
          "type": "text"
        },
        "gene": {
          "type": "text",
          "fields": {
            "keyword": { 
              "type": "keyword"
            }  
          }
        },
        "tag": {
          "type": "text"
        },
        "annotation": {
          "type": "text"
        }
      }
    }
  }
}

There are many entries per gene. 每个基因有很多条目。 That is, I have: 也就是说,我有:

Gene  Tag   Annotation
----- ----- ---------------
A1BG  tag1  first gene
A2M   tag1  a-macroglobulin
A2M   tag2  second gene
BRCA1 tag1  breast cancer 1
BRCA1 tag3  important gene

I want to query these data, aggregate and sort by gene, and get something like this: 我想查询这些数据,按基因进行汇总和排序,然后得到如下结果:

Gene   Tags        Annotations
------ ----------- -------------------------------
A1BG   tag1        first gene
A2M    tag1; tag2  a-macroglobulin; second gene
BRCA1  tag1; tag3  breast cancer 1; important gene

I can not find anything meaningful after googling for more than a day. 谷歌搜索超过一天后,我找不到任何有意义的东西。 Elasticsearch examples mostly show statistics eg counts, a few examples about concatenating fields from the same document but I could not find a way to concatenate the values of the same field. Elasticsearch示例主要显示统计信息(例如计数),还有一些有关连接同一文档中字段的示例,但是我找不到连接同一字段值的方法。 I tried to use map as well as something like this: 我试图使用map以及类似的东西:

{
    "aggs" : {
        "genes_agg" : {
            "terms" : {
                "script" : {
                    "source": "doc['tag'].join('; ')",
                    "lang": "painless"
                }
            }
        }
    }
}

but nothing works. 但没有任何效果。

I think you can't find anything because you're approaching this from a relational database perspective. 我认为您找不到任何东西,因为您是从关系数据库的角度来解决这个问题的。 Elasticsearch is built like a document store so you would basically put all the tags, annotations, etc for BRCA1 in one document. Elasticsearch的构建就像一个文档存储,因此您基本上可以将BRCA1所有标签,注释等放入一个文档中。 I think you need to rethink your indexing strategy, not your querying strategy. 我认为您需要重新考虑索引策略,而不是查询策略。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 Elasticsearch 按文本字段关键字排序 - Elasticsearch sort by text field keyword 如何在Elasticsearch中使用文本字段过滤器聚合数据? - How to aggregate data with filter on text fields in elasticsearch? 按文本字段对 elasticsearch 聚合桶进行排序 - Sort elasticsearch aggregation buckets by text field elasticsearch中的多字段文本和关键字字段 - Multi field text and keyword fields in elasticsearch Elasticsearch:根据文本字段中搜索字符串的索引值对文档进行排序 - Elasticsearch: Sort the Documents on the index value of the search string in a text field 如何修复关键字字段的ElasticSearch“默认情况下在文本字段上禁用字段数据” - How to fix ElasticSearch ‘Fielddata is disabled on text fields by default’ for keyword field 文本字段上的ElasticSearch Analyzer - ElasticSearch Analyzer on text field ElasticSearch Painless Scripts - 字段上下文不允许文本字段 - 什么上下文只添加文本字段? - ElasticSearch Painless Scripts - Field Context does not allow Text Fields - What context adds just Text Fields? Elasticsearch:有条件地对2个字段进行排序,1替换另一个字段(如果存在) - Elasticsearch: conditionally sort on 2 fields, 1 replaces the other if it exists 带有文本字段的 Elasticsearch 基数聚合 - Elasticsearch cardinality aggregation with text fields
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM