简体   繁体   中英

mysql having… > avg() doesn't work as expected

I've created two views to help calculate user_diary_number and then select users of whom diary numbers > average of total user's user_diary_number.

two views are like below:

create view user_diary_number as
(
select user_id,count( distinct diary_id ) as diary_num
from user_diary

group by user_id
);

and second using having and avg :

create view hw_diary as
(
select u.user_id, u.realname, ud.diary_num, school.school_name
from (user as u cross join user_diary_number as ud on u.user_id = ud.user_id )cross join school on u.school_id = school.school_id
having diary_num > avg(diary_num)

);

What the problem is now, the second view only have 1 row of result. and absolutely, we have more than 1 users whose diary number > average diary_num. Indeed, I have 251 diaries in total and 103 users. Some of users have 9, 4, 5 diaries. But the result only comes in only 1 user who have 3 diaries.

my relative tables are:

CREATE TABLE IF NOT EXISTS `school` (
  `school_id` int(11) NOT NULL,
  `school_name` varchar(45) NOT NULL,
  `location` varchar(45) NOT NULL,
  `master` varchar(45) NOT NULL,
  `numbers_of_student` int(11) NOT NULL,
  PRIMARY KEY (`school_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

CREATE TABLE IF NOT EXISTS `user_diary` (
  `diary_id` int(11) NOT NULL AUTO_INCREMENT,
  `user_id` int(11) NOT NULL,
  `title` varchar(45) NOT NULL,
  `content` varchar(255) NOT NULL,
  `addtime` DATETIME NOT NULL,
  PRIMARY KEY (`diary_id`,`user_id`),
  KEY `fk_diary_user_id_idx` (`user_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 AUTO_INCREMENT=1 ;

Is it any problems with cross join? or something else? thanks a lot!

You can't use avg that way. In my personal movie database,

select * from movie having year > avg(year);

produces nothing, and

select * from movie having year > (select avg (year) from movie);

produces the expected result.

You must calculate the average in a separate subquery.

Something like:

select ...
from ...
group by ...
having diary_num > (
    select avg(diary_num)
    from ...) 

You can fill in the blanks with what makes sense

Something like this should return the resultset you are looking for:

 SELECT u.user_id
      , u.realname
      , c.diary_num
      , s.school_name
   -- , a.diary_avg
   FROM ( SELECT d.user_id
               , COUNT(DISTINCT d.diary_id) AS diary_num
            FROM user_diary d
        ) c
   JOIN user u
     ON u.user_id = c.user_id
   JOIN school s
     ON s.school_id = u.school_id
   JOIN ( SELECT AVG(v.diary_num) AS diary_avg
            FROM ( SELECT t.user_id
                        , COUNT(DISTINCT t.diary_id) AS diary_num
                     FROM user_diary t
                 ) v
        ) a
     ON a.diary_avg < c.diary_num
  ORDER BY 1

The inline view aliased as c gets us the diary_num (count) for each user.

The inline view aliased as a gets us the average of all the diary_num for all users. That is getting us an "average" of the counts, which is what it looks like your original query was intending to do.

As an alternative, we could get the "average" number of diaries per user as ... the total count of all diaries divided by the total count of all users. To do that, replace that inline view aliased as a with something like this:

        ( SELECT COUNT(DISTINCT t.diary_id)
                 / NULLIF(COUNT(DISTINCT v.user_id),0) AS diary_avg
            FROM user v
            LEFT
            JOIN user_diary t
              ON t.user_id = v.user_id
        ) a

This yields slightly different results, as its a calculation on total counts, rather than an average of a calculation.


NOTE

The CROSS keyword has no influence on the MySQL optimizer.

We do typically include the CROSS keyword as documentation for future reviewers. It indicates that we have purposefully omitted the usual ON clause. (As a reviwer, when we see a JOIN without an ON clause, our minds race to a "possible unintended Cartesian product"... the author's inclusion of the CROSS keyword alerts us (the reviewer) that the omission of the ON clause was purposeful.

But the MySQL optimizer doesn't care one whit whether the CROSS keyword is included or omitted.


Really old versions (3.x ?) of MySQL did not support subqueries. 确实很老的MySQL版本(3.x?)不支持子查询。 But certainly, MySQL 5.1 and later do support subqueries.

To answer your question, yes, a SELECT statement can be used as an inline view as a rowsource for another query, eg

SELECT v.*
  FROM (
         SELECT 1 AS foo
       ) v

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM