In Elasticsearch, how does one aggregate and sort by a text field and concatenate field values of other text fields, joined by eg ;
?
In concatenating I mean to concatenate values of the same field from all the aggregated documents, not values of different fields from the same document.
Details
I have small documents with fields gene, tag, annotation described as
{
"mappings": {
"annotations": {
"properties": {
"species": {
"type": "text"
},
"gene": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword"
}
}
},
"tag": {
"type": "text"
},
"annotation": {
"type": "text"
}
}
}
}
}
There are many entries per gene. That is, I have:
Gene Tag Annotation
----- ----- ---------------
A1BG tag1 first gene
A2M tag1 a-macroglobulin
A2M tag2 second gene
BRCA1 tag1 breast cancer 1
BRCA1 tag3 important gene
I want to query these data, aggregate and sort by gene, and get something like this:
Gene Tags Annotations
------ ----------- -------------------------------
A1BG tag1 first gene
A2M tag1; tag2 a-macroglobulin; second gene
BRCA1 tag1; tag3 breast cancer 1; important gene
I can not find anything meaningful after googling for more than a day. Elasticsearch examples mostly show statistics eg counts, a few examples about concatenating fields from the same document but I could not find a way to concatenate the values of the same field. I tried to use map
as well as something like this:
{
"aggs" : {
"genes_agg" : {
"terms" : {
"script" : {
"source": "doc['tag'].join('; ')",
"lang": "painless"
}
}
}
}
}
but nothing works.
I think you can't find anything because you're approaching this from a relational database perspective. Elasticsearch is built like a document store so you would basically put all the tags, annotations, etc for BRCA1
in one document. I think you need to rethink your indexing strategy, not your querying strategy.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.