简体   繁体   English

MYSQL JOIN和GROUP / DISTINCT

[英]MYSQL JOIN and GROUP / DISTINCT

I have 3 tables I'm joining together to get what users are in a specific area. 我要加入3个表,以了解特定区域中的用户。 A scaled back example of the tables: 这些表的缩小示例:

USER Table (stores all user information) 
ID | Name
----------
 1   John
 2   Joe
 3   Mike 

GEO (has all geo location info; including latitude and longitude; which im excluding for the example )
ID | CITY 
-------------
 1 | ORLANDO
 2 | MIAMI
 3 | DAYTONA

LOCATIONS (stores each users location; each user has multiple locations)
ID | AREA (id = user.id, geo = geo.id)
--------
 1 | 1
 1 | 2
 1 | 3
 2 | 1
 3 | 1
 3 | 3

I've created a function in php to pull the results for a given LAT / LONG withing a certain radius like so (excluding the whole function as its not relevant): 我在php中创建了一个函数,以给定的LAT / LONG具有一定的半径来拉结果,如下所示(不包括不相关的整个函数):

select USER.ID as USERID, (6371 * acos(cos(radians( {$lat})) * cos(radians(g.latitude)) * cos(radians(g.longitude) - radians({$long})) + sin(radians({$lat})) * sin(radians(g.latitude)))) AS distance
            from 
            GEO G 
            join LOCATIONS LOC on LOC.AREA = G.ID
            join USER U on LOC.ID = USERID
            HAVING distance <= {$radius}

Now the issue. 现在的问题。 This works and pulls all the info, but results in showing the same user multiple times due to the user being in the LOCATIONS table multiple times (ie shows 100 results, with 15 different users) 这有效并提取所有信息,但是由于该用户多次在LOCATIONS表中(即显示100个结果,其中15个不同的用户),导致多次显示同一用户

So my thought was to GROUP BY USER.id; 因此,我想到的是GROUP BY USER.id; however this only matches the first location for that user; 但这仅与该用户的第一个位置匹配; only resulting in 2 results. 仅产生2个结果。

I've tried DISTINCT; 我尝试过DISTINCT; but the rows are not distinct as the user.id or location.id are a different combo for each row. 但各行之间没有区别,因为user.id或location.id是每行的不同组合。

I've also tried working backwards with sub queries 我也尝试过向后处理子查询

SELECT * from USER where id = (
select id from GEO where area = (
select id, (long trig here) as distance) from GEO)

but that wont work as I have to select the trig statement as distance so I can't just just select the id from the GEO table 但这行不通,因为我必须选择trig语句作为距离,所以我不能只是从GEO表中选择id

I'm at my wits end trying to get unique users; 我竭尽全力试图吸引唯一的用户; but still have it search in all the user locations. 但仍然可以在所有用户位置进行搜索。 I know I could loop the results in php and rebuild them; 我知道我可以在php中循环结果并重建它们; however this query easily returns thousands of result since each users location is shown in the results and I'd rather not do that for speed purposes. 但是,此查询很容易返回数千个结果,因为结果中显示了每个用户的位置,因此我不希望这样做是为了提高速度。

Any help in the right direction would be much appreciated.. 任何在正确方向上的帮助将不胜感激。

ADDITION 加成

to elaborate the result issue a tad, if you ran this query on ORLANDO with a radius that would extend to DAYTONA, if a user was in DAYTONA you'd get 为了详细说明结果,如果您在ORLANDO上以半径扩展到DAYTONA的方式运行此查询,则如果用户在DAYTONA中,则会得到提示

USER | CITY
-----------
 1  | ORLAND
 1  | DAYTONA
 2  | ORLANDO
 3  | ORLANDO
 3  | DAYTONA

which results in duplicates of user 1 & 3 导致用户1和3重复

but when you group by user.id you only get 但是当您按user.id分组时,您只会得到

 USER | CITY
-----------
 2  | ORLANDO

which drops user 1 & 3 since when its grouped it only shows their area as DAYTONA 删除用户1和3,因为将用户1和3分组后仅将其区域显示为DAYTONA

If you use WHERE instead of HAVING you would be able to use GROUP BY / DISTINCT and catch 'm all like so: 如果使用WHERE ,而不是HAVING能够使用GROUP BY / DISTINCT 赶上“M都喜欢这样:

SELECT u.id AS USERID
    FROM `GEO` g
    JOIN `LOCATIONS` l ON l.`AREA` = g.`ID`
    JOIN `USER` u ON l.`ID` = u.`ID`
    WHERE (6371 * ACOS(COS(RADIANS({$lat})) * COS(RADIANS(g.latitude)) * COS(RADIANS(g.longitude) - RADIANS({$long})) + SIN(RADIANS({$lat})) * SIN(RADIANS(g.latitude)))) <= {$radius}
    GROUP BY u.`ID`

This may be optimized by using an 'early' pre-aggregated filter. 这可以通过使用“早期”预聚合滤波器进行优化。 Ie by applying the WHERE on the ON as early as possible. 即,尽可能早地将WHERE应用于ON Though this may look 'weird', it can be significantly faster. 尽管这看起来“怪异”,但速度可能会大大提高。 In your case this would look like this: 您的情况如下所示:

SELECT u.id AS USERID
    FROM `GEO` g
    JOIN `LOCATIONS` l ON 
        (6371 * ACOS(COS(RADIANS({$lat})) * COS(RADIANS(g.latitude)) * COS(RADIANS(g.longitude) - RADIANS({$long})) + SIN(RADIANS({$lat})) * SIN(RADIANS(g.latitude)))) <= {$radius}
        AND l.`AREA` = g.`ID`
    JOIN `USER` u ON l.`ID` = u.`ID`        
    GROUP BY u.`ID`
  • Note that if you'd want to select distance as well you can still put in in the select field list as you did; 请注意,如果您也想选择距离,则仍然可以像以前一样将其放入选择字段列表中。 however, as you're if using DISTINCT you'll get just one, while if using GROUP BY you'd be able to concatenate all the distances 但是,就像使用DISTINCT您只会得到一个,而如果使用GROUP BY ,则可以将所有距离连接在一起
  • I'd recommend trying out both GROUP BY and DISTINCT as performance differences can be quite extreme and unpredictable. 我建议您同时尝试GROUP BY DISTINCT因为性能差异可能非常极端且不可预测。 (see eg this question ) (例如参见此问题
  • Just wondering, but it'd be more efficient to precalculate parts such as ACOS(COS(RADIANS({$lat})) instead of doing it on the fly, any reason to keep it like this? 只是想知道,但是预先计算诸如ACOS(COS(RADIANS({$lat}))零件而不是即时进行处理会更有效,有什么理由让它保持这样?
  • Additionaly, you may want to store the long / lat values in radians for further optimization 另外,您可能希望将长/纬度值存储在弧度中以进行进一步优化

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM