简体   繁体   中英

Aggregate and Sort by a text field and concatenate other text fields in Elasticsearch

In Elasticsearch, how does one aggregate and sort by a text field and concatenate field values of other text fields, joined by eg ; ?

In concatenating I mean to concatenate values of the same field from all the aggregated documents, not values of different fields from the same document.

Details

I have small documents with fields gene, tag, annotation described as

{
  "mappings": {
    "annotations": {
      "properties": {
        "species": {
          "type": "text"
        },
        "gene": {
          "type": "text",
          "fields": {
            "keyword": { 
              "type": "keyword"
            }  
          }
        },
        "tag": {
          "type": "text"
        },
        "annotation": {
          "type": "text"
        }
      }
    }
  }
}

There are many entries per gene. That is, I have:

Gene  Tag   Annotation
----- ----- ---------------
A1BG  tag1  first gene
A2M   tag1  a-macroglobulin
A2M   tag2  second gene
BRCA1 tag1  breast cancer 1
BRCA1 tag3  important gene

I want to query these data, aggregate and sort by gene, and get something like this:

Gene   Tags        Annotations
------ ----------- -------------------------------
A1BG   tag1        first gene
A2M    tag1; tag2  a-macroglobulin; second gene
BRCA1  tag1; tag3  breast cancer 1; important gene

I can not find anything meaningful after googling for more than a day. Elasticsearch examples mostly show statistics eg counts, a few examples about concatenating fields from the same document but I could not find a way to concatenate the values of the same field. I tried to use map as well as something like this:

{
    "aggs" : {
        "genes_agg" : {
            "terms" : {
                "script" : {
                    "source": "doc['tag'].join('; ')",
                    "lang": "painless"
                }
            }
        }
    }
}

but nothing works.

I think you can't find anything because you're approaching this from a relational database perspective. Elasticsearch is built like a document store so you would basically put all the tags, annotations, etc for BRCA1 in one document. I think you need to rethink your indexing strategy, not your querying strategy.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM