What is the best approach to find all addresses that are in a specific distance to the selected point

Question

I am developing an application that is supposed to show addresses that are in a specific distance of a location. I know how to find the distance between two points, but the problem is I am not sure what would be the best approach in terms of performance.

One way is to retrieve all addresses and check them one by one toward the selected address in back-end but is there any way to minimize the number of items that I retrieve from database, rather than using the memory? whats the best approach to do it and how?

Imagine I have 300,000 records do I have to retrieve them all and calculate their distance to the selected point? As James suggested I can have the records in different regions and calculate the distance, then which method would be good to follow,distance calculation through query or Java?

  public class Address{
    long Id;
    Double latitude;
    Double longitude;
    ..
  }

Calculation

public static double distFrom(double lat1, double lng1, double lat2, double lng2) {
  double earthRadius = 3958.75;
  double dLat = Math.toRadians(lat2-lat1);
  double dLng = Math.toRadians(lng2-lng1);
  double sindLat = Math.sin(dLat / 2);
  double sindLng = Math.sin(dLng / 2);
  double a = Math.pow(sindLat, 2) + Math.pow(sindLng, 2)
        * Math.cos(Math.toRadians(lat1)) *     Math.cos(Math.toRadians(lat2));
  double c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a));
  double dist = earthRadius * c;

  return dist;
}

This question and this one offer methods to calculate distance through mysql but which way is better Java or mysql I am quite confused.

Answer 1

When I have implemented this in MySQL (for storing places on an oblate sphere, which is basically what earth is (I assume you're talking about earth!)), I have stored as much pre-calculated information as possible in the database. So, for a row that stores latitude and longitude , I also calculate at insertion time the following fields:

radiansLongitude ( Math.toRadians(longitude) )
sinRadiansLatitude ( Math.sin(Math.toRadians(latitude) )
cosRadiansLatitude ( Math.cos(Math.toRadians(latitude) )

Then when I search for the places that are within X units of the latitude / longitude in question, my prepared statement is as follows:

from Location l where
    acos(
        sin(:latitude) * sinRadiansLatitude + 
        cos(:latitude) * cosRadiansLatitude * 
        cos(radiansLongitude - :longitude) 
        ) * YYYY < :distance
    and l.latitude>:minimumSearchLatitude
    and l.latitude<:maximumSearchLatitude 
    and l.longitude>:minimumSearchLongitude 
    and l.longitude<:maximumSearchLongitude 
    order by acos(
                sin(:latitude) * sinRadiansLatitude + 
                cos(:latitude) * cosRadiansLatitude * 
                cos(radiansLongitude - :longitude)  
        ) * YYYY asc

Where YYYY = 3965 gives you distances in miles or YYYY = 6367 can be used for distances in km.

Finally, I have used the maximumSearchLatitude / maximumSearchLongitude / minimumSearchLongitude / maximumSearchLongitude parameters to exclude the majority of points from the resultset before the database has to perform any calculations. You may or may not need this. If you do use this, it'll be up to you what values you choose for these parameters, as it will depend on what you're searching.

Obviously judicious applications of indexes in the database will be necessary.

The benefit of using this approach is that the information which never changes but is needed every time is only calculated once, whereas calculating the values of radiansLongitude , sinRadiansLatitude , cosRadiansLatitude for every row every time you perform a search is going to get very expensive very fast.

The other option is to use a geospatial index , which means that all of this is taken handled for you by the database. I don't know how well Hibernate integrates with that though.

Disclaimer: it's a long time since I looked at this, and I'm not a GIS expert!

Answer 2

You could do the calculation server-side in the query itself instead of client side, thus retrieving only the results of the calculation. Here ( archive link for posterity) is an example Haversine-based implementation in SQL (sorry, the article is simply too lengthy for me to copy+paste or summarize here, although it is a great article and an easy read).

Alternatively, you could divide your database into regions (eg a quad-tree of sorts with polar coordinates) and retrieve only the regions near the point, giving you a smaller subset to test against client-side. Similarly, you could calculate a rough latitude and longitude bounding box based on your distance, with a database index on latitude and longitude, and select only addresses in that range for consideration in your calculations.

The query approach is a simpler, cleaner approach though, with good performance due to the initial distance filtering. I'd only do the region approach if the former is not possible for you to implement for some reason.

Answer 3

I would say database approach is the best as you wont need to have a huge memory. You can use following code to retrieve them through hibernate.

@Transactional
public List<Double> getAllPoisAroundUser(double longitude, double latitude, int page) {

Query query = getSessionFactory().getCurrentSession().createSQLQ uery("SELECT (6371 * 2 * ASIN(SQRT(POWER(SIN((:ulatitude - abs(latitude)) * pi()/180 / 2),2) +" +
"COS(:ulatitude * pi()/180 ) * COS(abs(latitude) * pi()/180) *" +
"POWER(SIN((:ulongitude - longitude) * pi()/180 / 2), 2))))*1000 as distance " +
"FROM poi HAVING distance < 5000 ORDER BY distance");

query.setParameter("ulongitude", longitude);
query.setParameter("ulatitude", latitude);
query.setFirstResult((page-1)*10);
query.setMaxResults(10);

return (List<Double>) query.list();
}

Answer 4

I am using hibernate and do this in this way:

public List<Tour> searchTours(double lat, double lon, double distance) {

    Session session = getSession();

    Criteria criteria = session.createCriteria(Tour.class, "tour");

    //
    // 1 Grad lat = 111 km
    // 1 grad lon = cos(lat) * 111
    //
    final double KM_IN_ONE_LAT = 111.0;

    double t1 = distance / Math.abs(Math.cos(Math.toRadians(lat)) * KM_IN_ONE_LAT);
    double t2 = distance / KM_IN_ONE_LAT;

    double lonA = lon - t1;
    double lonB = lon + t1;

    double latA = lat - t2;
    double latB = lat + t2;

    Criterion c1 = Restrictions.between("longitude", lonA, lonB);
    Criterion c2 = Restrictions.between("latitude", latA, latB);

    criteria.add(c1);
    criteria.add(c2);

    criteria.setResultTransformer(Criteria.DISTINCT_ROOT_ENTITY);

    return criteria.list();
}

Check this paper for more information: Geo (proximity) Search with MySQL

Answer 5

How accurate do you need. Using postgres GIS index or an r-tree index can be useful as the starting point.. Then perform a bounding box query.. Then perform a radial distance on the client.. That way the FP math isn't done by the central server (choking scaleability). My issue is that GIS and rtrees are the slowest types of indexes (worsted only by FTS indexes). So I've typically opted for 1D indexes like geohashes.. If you have point data, just store everything in a common GSD (Ground Sample Distance), like 10 meter or 1 meter or what-have-you.. You construct a 'string' (typically base-64 encoded) that is the lat-long (every bit alternates lat and long). The points are stored as a simple string index in the DB (very efficient for indexing and storage). Then for queries, you have to produce a bounding box from a search point across the range of geo-hashes you're interested in... Unless you have very very large radiuses, this should narrow down the search results... Do the final filtration in the client (or using one of the techniques listed by other's for pre-calculated trig values).

Problem, however, is that Sifting through 1M points is fast. Making 1,000 random disk accesses is unusable. So even if you have a nice geo-hash, if it has many random points; this isn't going to work.

What I've typically done is to bin on disk all relevant data-blocks. So a geo search gives you a finite set of disk locations... You then load ALL the data (multiple dozens of MB) in up to 4 disk loads. Then sift through all the geometry. This can be 1000x faster in the best case (vs 1,000 disk rand accesses). But obviously has serious constraints on how you stored that data into grids in the first place (full-rewriting or fixed-sizing your bins).

Obviously if you have enough RAM to cache the entire DB, then start there. The algorithm isn't going to matter as much. First think through disk-access patterns. Then CPU access patterns (you can scale CPUs, but it's hard to maintain duplicates of your disk-data).

Answer 6

Plan A: Since you have 300K rows, INDEX(lat) is a non-starter, performance-wise, even with restricting to a stripe: AND lat BETWEEN 65 AND 69 . INDEX(lat, lng) is no better because the optimizer would not use both columns, even with AND lng BETWEEN...

Plan B: Next choice will involve lat and lng, plus a subquery. And version 5.6 would be beneficial. It's something like this (after including INDEX(lat, lng, id) ):

SELECT ... FROM (
    SELECT id FROM tbl
        WHERE lat BETWEEN... 
          AND lng BETWEEN... ) x
    JOIN tbl USING (id)
    WHERE ...;

For various reasons, Plan B is only slightly better than Plan A.

Plan C: If you are going to need millions of rows, you will need my pizza parlor algorithm . This involves a Stored Procedure to repeatedly probe, looking for enough rows. It also involves PARTITION ing to get a crude 2D index.

Plans A and B are O(sqrt(N)) ; Plan C is O(1) . That is, for Plans A and B, if you quadruple the number of rows, you double the time taken. Plan C does not get slower as you increase N.

Answer 7

You can use raw query for selecting list of ids form Address table in hibernate.

public List<Long> getNearByLocations(float latitude, float longitude,
            float distance) {
        Session sess = getSession();
        String queryString = "SELECT id, (6371 * acos (cos(radians("
                + latitude
                + ")) * cos(radians(latitude)) * cos(radians(longitude) - radians("
                + longitude
                + "))  + sin(radians("
                + latitude
                + ")) * sin(radians(latitude)))) AS distance FROM Address HAVING distance < "
                + distance + " ORDER BY distance";
        Query qry = sess.createSQLQuery(queryString);

        List<Object[]> list = null;
        list = qry.list();
        List<Long> idList = new ArrayList<>();
        for (Object[] obj : list) {
            Long id = (Long) obj[0];
            idList.add(id);
        }
        return idList;
    }

Answer 8

It's not efficient or scalable to query the whole database table. Consider using R-tree for better performance.

What is the best approach to find all addresses that are in a specific distance to the selected point

Question

8 answers

solution1
6 2015-03-26 07:47:51

solution2
3 2015-03-04 06:42:43

solution3
2 2015-03-26 06:47:15

solution4
2 2015-03-26 08:06:09

solution5
1 2015-03-30 00:44:58

solution6
1 2015-03-30 04:00:23

solution7
1 2015-03-30 10:03:39

solution8
0 2017-10-22 14:51:28

What is the best approach to find all addresses that are in a specific distance to the selected point

Question

8 answers

solution1 6 2015-03-26 07:47:51

solution2 3 2015-03-04 06:42:43

solution3 2 2015-03-26 06:47:15

solution4 2 2015-03-26 08:06:09

solution5 1 2015-03-30 00:44:58

solution6 1 2015-03-30 04:00:23

solution7 1 2015-03-30 10:03:39

solution8 0 2017-10-22 14:51:28

solution1
6 2015-03-26 07:47:51

solution2
3 2015-03-04 06:42:43

solution3
2 2015-03-26 06:47:15

solution4
2 2015-03-26 08:06:09

solution5
1 2015-03-30 00:44:58

solution6
1 2015-03-30 04:00:23

solution7
1 2015-03-30 10:03:39

solution8
0 2017-10-22 14:51:28