简体   繁体   中英

Left Semi Join on Geo-Spatial tables in Spark-SQL & GeoMesa

Problem:

I have 2 tables (d1 & d2) containing Geo-spatial points. I want to carry out the following query:

select * from table 1 where table1.point is within 50km of any point in table2.point

I am using Spark-SQL with GeoMesa & Accumulo to achieve the same. (Spark as processing engine, Accumulo as Data Store & GeoMesa for GeoSpatial libraries).

The above query is kind of left semi join but I am not sure on how to achieve it using Spark-SQL because as far as I have read subqueries can't be used in where clause.

Was able to achieve this using:

select * from d1 left semi join d2 on st_contains(st_bufferPoint(d1.point, 10000.0), d2.point)

Spark broadcasted d2 & is carrying out joins but it is still taking more time as the size of d1 is 5 billion & d2 is 10 million.

Not sure though if there is any more efficient way to achieve the same.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM