简体   繁体   中英

Elasticsearch term query grouped by geo_point

I am tring to make a kind of group_by with elasticsearch.

I store a text field:

"text": {
  "type": "string",
  "store": true,
  "analyzer" : "twittsTokenizor"
},

and a geo pint field

"geo": {
  "type": "geo_point",
  "store": true
}

I am trying to get the most used terms on my text field grouped by my location and... it's not working.

If a query like this works (if i define my location in my query) :

curl -XGET http://localhost:9200/twitter/_search  -d '{
"query" : {
    "match_all" : {}
    },
    "filter" : {
        "bool" : {
            "must" : [{
                "range" : {
                    "created_at" : {
                        "from" : "Mon Feb 22 14:04:23 +0000 2015",
                        "to" : "Wed Feb 23 22:06:25 +0000 2015"
                    }}
                },{
                "geo_distance" : {
                    "distance" : "100km",
                    "geo" : { "lat" : 48.856506, "lon" : 2.352133 }
                }}
            ]
        }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "text" ,
                "size" : 10
            }
        }

    }
}'

This is not working :

curl -XGET http://localhost:9200/twitter/_search  -d '{
"query" : {
        "match_all" : {
        }
    },
    "aggs" : {
        "geo1" : {
            "terms" : {
                "field" : "geo"
            }
        },
        "tag" : {
            "terms" : {
                "field" : "text" ,
                "size" : 10
            }
        }
    }
}
}'

This don't do the job :

curl -XGET http://localhost:9200/twitter/_search  -d '{
"query" : {
        "match_all" : {
        }
    },
    "facets" : {
        "tag" : {
            "terms" : {
                "field" : "text" ,
                "size" : 10
            }
        },
        "geo1" : {
            "terms" : {
                "field" : "geo"
            }
        }
    }
}
}'

And facet_filters didn't do the job.

What am i doing wrong ? Is it even possible ? Thank you very much.

Edit : Here is my mapping :

curl -s -XPUT "http://localhost:9200/twitter" -d '
{
  "settings": {
    "analysis": {
      "analyzer": {
        "twittsTokenizor" : {
            "type" : "custom",
            "tokenizer" : "standard",
            "filter" : [
                "french_elision",
                "asciifolding",
                "lowercase",
                "french_stop",
                "english_stop",
                "twitter_stop"
            ]
        }
      },
      "filter" : {
        "french_elision" : {
          "type" : "elision",
          "articles" : [ "l", "m", "t", "qu", "n", "s",
                          "j", "d", "c", "jusqu", "quoiqu",
                          "lorsqu", "puisqu"
                        ]
        },
        "asciifolding" : {
            "type" : "asciifolding"
        },
        "french_stop": {
            "type":       "stop",
            "stopwords":  "_french_" 
        },
        "english_stop": {
            "type":       "stop",
            "stopwords":  "_english_" 
        },
        "twitter_stop" : {
            "type" : "stop",
            "stopwords": ["RT", "FAV", "TT", "FF", "rt"]
        }
      }
    }
  },
  "mappings": {
    "twitter": {
      "properties": {
        "id": {
          "type": "long",
          "store": true
        },
        "text": {
          "type": "string",
          "store": true,
          "analyzer" : "twittsTokenizor"
        },
        "created_at": {
          "type": "date",
          "format": "EE MMM d HH:mm:ss Z yyyy",
          "store": true
        },
        "location": {
          "type": "string",
          "store": true
        },
        "geo": {
          "type": "geo_point",
          "store": true
        }
      }
    }
  }
}'

and a sample of data :

{ "_id" : ObjectId("54eb3c35a710901a698b4567"), "country" : "FR", "created_at" : "Mon Feb 23 14:25:30 +0000 2015", "geo" : { "lat" : 49.119696, "lon" : 6.176355 }, "id" : -812216320, "location" : "Metz ", "text" : "Passer des vacances formidable avec des gens géniaux sans aucunes pression avec pour seul soucis s'éclater et se laisser vivre #BONHEUR"}

If I understand you correctly you want to get most used terms per geo point? Then you need two levels of aggregations, first a geohash_grid aggregation then a terms aggregation:

{
    "query": {
        "match_all": {}
    },
    "aggs": {
        //Buckets for each geohash grid
        "geo1": {
            "geohash_grid": {
                "field":"geo",
                "precision": 5
            },
            "aggs": {
                //Buckets for each unique text-tag in this geo point bucket, maximum of 10 buckets
                "tags": {
                    "terms": {
                        "field": "text",
                        "size": 10
                     }
                }
            }
        }
    }
}

Alternatively you could use geo_distance aggregation

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM