简体   繁体   中英

Elastic Search Lucene formula calculation Using Java

New to Elastic search, using an index to store the documents which eg is company information about employees, there are currently 600,000 employee data in the index, among these employees we need to figure out the distance calculations based on a particular attribute like address. What we do is essentially the following steps:

  • Pull all the documents within the index in a java program.
  • Use Lambdas in for parallelism and iterate over each document and then calculate the distance(lavenshtien, NGram and TFID) with other elements in the collections and then average out the value.

The problem with this flow is that we load all the documents present in the index in java memory and then apply the formulas, this eats up a lot of time both to load and apply the formulas, moreover JVM will have memory limitation to load the documents in memory.

Forgive me for limited knowledge on the subject but is there a way in which we can run these distance formulas directly on elastic search rather than loading the whole index in the memory.

Thanks for help...

There is a data type in elasticsearch for geo points: https://www.elastic.co/guide/en/elasticsearch/reference/current/geo-point.html

if you able to convert the address you have to their lat/lon coordinates, either you have it, or use a service that resolves addresses to geo point, then you can map that geo point field to an elasticsearch geo_point (in the index template, see the link I've pasted above for example). if you won't map the field to geo_point elasticsearch would treat it as an array of floats, which is float type.

Once you have geo point field, you can start running some distance aggregations on it. There are three aggregations that work with fields of type geo_point. see the options here: https://www.elastic.co/guide/en/elasticsearch/guide/current/geo-aggs.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM