简体   繁体   English

SQL-联接表的列中最常出现的值

[英]SQL - Most frequent value in column of joined tables

I have three tables described below: 我有以下三个表格:

Area (Id, Description)

City(Id, Name)

Problem(Id, City, Area, Definition):
 City references City (Id), Area references Area (Id)

I want to find the most frequent value of Area(Description) that appears in Problem for each City (Name). 我想查找在“问题”中出现的每个城市(名称)的“区域(描述)”的最常用值。

Example: 例:

Area
Id   Description
1      Support
2      Finance  

City
Id      Name
1      Chicago
2      Boston

Problem
Id  City  Area  Definition
1     1     2       A
2     1     2       B
3     1     1       C
4     2     1       D

Desired Output: 所需输出:

 Name         Description
 Chicago        Finance
 Boston         Support

Here's what I have tried with no success : 这是我尝试未成功的尝试:

SELECT Name,
       Description
FROM
  (SELECT *
   FROM Problem AS P,
        City AS C,
        Area AS A
   WHERE C.Id = P.City
     AND A.Id = P.Area ) AS T1
WHERE Description =
    (SELECT Description
     FROM
       (SELECT *
        FROM Problem AS P,
             City AS C,
             Area AS A
        WHERE C.Id = P.City
          AND A.Id = P.Area ) AS T2
     WHERE T1.Name = T2.Name
     GROUP BY Description
     ORDER BY Count(Name) DESC LIMIT 1 )
GROUP BY Name,
         Description

Thanks! 谢谢!

The Max For each city, and area should be 每个城市和地区的最大值应为

  select  C.Name, A.Description from (
    select t1.City, t1.Area, max(freq)  as max_freq
    from (
        select P.City, P.Area, count(*) as Freq
        from Problem as P 
        group by P.City, P.Area
    ) t1
  ) t2 
  INNER JOIN City AS C ON t2.City = C.Id
  INNER JOIN Area AS A ON A.Id = t2.Area

This is probably the shortest way to solve your issue: 这可能是解决问题的最短方法:

select c.Name, a.Description
from City c
cross join Area a
where a.Id = (
    select p.Area
    from Problem p
    where p.City = c.Id
    group by p.Area
    order by count(*) desc, p.Area asc
    limit 1
)

We use a CROSS JOIN to combine every City with every Area . 我们使用CROSS JOIN将每个City和每个Area结合起来。 But we pick only the Area with the highest count in the Problem table for the given city, which is determined in the correlated subquery. 但是我们只在给定城市的“ Problem表中选择计数最高的Area ,这是在相关子查询中确定的。 If two areas have the same highest count for a city, the one coming first alphabetically will be picked ( order by ... p.Area asc ). 如果两个地区的城市最高计数相同,则将按字母顺序选择order by ... p.Area ascorder by ... p.Area asc )。

Result: 结果:

|    Name | Description |
|---------|-------------|
|  Boston |     Support |
| Chicago |     Finance |

Here's another more complex solution which includes the count. 这是另一个更复杂的解决方案,其中包括计数。

select c.Name, a.Description, city_area_maxcount.mc as problem_count
from (
    select City, max(c) as mc
    from (
        select p.City, p.Area, count(*) as c
        from problem p
        group by p.City, p.Area
    ) city_area_count
    group by City
) city_area_maxcount
join (
    select p.City, p.Area, count(*) as c
    from problem p
    group by p.City, p.Area
) city_area_count
    on  city_area_count.City = city_area_maxcount.City
    and city_area_count.c = city_area_maxcount.mc
join City c on c.Id = city_area_count.City
join Area a on a.Id = city_area_count.Area

The subquery alisaed as city_area_maxcount is used twice here (i hope mysql can cache the result). 别名为city_area_maxcount的子查询在这里使用了两次(我希望mysql可以缓存结果)。 If you think of it as a table, that would be a common find-the-row-with-top-value-per-group problem. 如果您将其视为表格,那将是一个常见的“每组最高价值行查找”问题。 If two areas have the same highest count for a city, both will be selected. 如果两个区域的城市最高计数相同,则将同时选择两个区域。

Result: 结果:

|    Name | Description | problem_count |
|---------|-------------|---------------|
|  Boston |     Support |             1 |
| Chicago |     Finance |             2 |

Demo: http://sqlfiddle.com/#!9/c66a5/2 演示: http : //sqlfiddle.com/#!9/c66a5/2

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM