简体   繁体   中英

union not working as expected in PostgreSQL

I Have a union query:

(SELECT
    to_char(createdatutc,'YYYY') as "Yr",
    to_char(createdatutc,'MM') as "Mh",
    count(postid) as Freq
FROM conversations
WHERE type = 'Post'
GROUP BY Yr, Mh
HAVING Yr = '2018')
UNION
(SELECT
    to_char(createdatutc,'YYYY') as "Yr",
    to_char(createdatutc,'MM') as "Mh",
    count(postid) as Freq
FROM conversations
WHERE type <> 'Post'
GROUP BY Yr, Mh having Yr = '2018')
ORDER BY  Yr, Mh

which is throwing the following error upon executing:

org.postgresql.util.PSQLException: ERROR: column "conversations.createdatutc" must appear in the GROUP BY clause or be used in an aggregate function`

However, If I run them individually they run properly , here createdatutc is a timestamp field

Do the to_char extraction etc in a derived table, group by its result:

select "Yr", "Mh", count(postid), type
from
(
    SELECT
        to_char(createdatutc,'YYYY') as "Yr",
        to_char(createdatutc,'MM') as "Mh",
        postid,
        case when type = 'Post' then 'Post' else 'NotPost' end type
    FROM conversations
) dt
where "Yr" = 2018
group by  "Yr", "Mh", type

Remove alias column name from group by like below

   select * from (
    (
    Select EXTRACT(Year FROM createdatutc::date) as "Yr",
     EXTRACT(MONTH FROM createdatutc::date) as "Mh",
    count(postid) as Freq 
     from conversations 
    where type = 'Post' 
    group by
     EXTRACT(Year FROM createdatutc::date), EXTRACT(MONTH FROM createdatutc::date) 
    having EXTRACT(Year FROM createdatutc::date) = 2018)
     union 
    (Select to_char(createdatutc,'YYYY') as "Yr",
     to_char(createdatutc,'MM') as "Mh", count(postid) as Freq 
    from conversations where type <> 'Post' 
    group by EXTRACT(Year FROM createdatutc::date), EXTRACT(MONTH FROM createdatutc::date)
 having EXTRACT(Year FROM createdatutc::date) = 2018)
) as t
order by  Yr, Mh

First of all: I'm surprised the individual queries run. You should not be able to use an alias column name in HAVING , because HAVING occurs before SELECT .

With UNION you are removing duplicates. So you count the months were there are exactly the same number of posts as non-posts only half. Is this what you are after? Seems strange.

Anyway, with your query you'll get multiple result rows and you wont be able to tell which are for posts and which are for non-posts.

(And just so you know: if type can be null, that won't be counted in any row, because NULL being unknown is considered neither equal nor unequal to 'Post'.)

Here are two ways to write the query:

One row per month and type

SELECT yr, mh, tp, COUNT(*)
FROM
(
  SELECT
    TO_CHAR(createdatutc, 'YYYY') AS yr,
    TO_CHAR(createdatutc, 'MM') AS mh,
    CASE WHEN type = 'Post' THEN 'Post' ELSE 'other' END AS tp
  FROM conversations
  WHERE EXTRACT(YEAR FROM createdatutc) = 2018
) yr2018
GROUP BY yr, mh, tp
ORDER BY yr, mh, tp;

One row per month

SELECT
  yr, mh,
  COUNT(CASE WHEN type = 'Post' THEN 1 END) AS count_posts,
  COUNT(CASE WHEN type <> 'Post' THEN 1 END) AS count_nonposts
FROM
(
  SELECT
    TO_CHAR(createdatutc, 'YYYY') AS yr,
    TO_CHAR(createdatutc, 'MM') AS mh,
    type
  FROM conversations
  WHERE EXTRACT(YEAR FROM createdatutc) = 2018
) yr2018
GROUP BY yr, mh
ORDER BY yr, mh;

You can do this without subqueries (derived tables), but then you'll have to repeat the same expressions again and again.

With help from @zaynul and @Thorsten I modified my query as below

select 
  yr,
  mh,
  sum(freq)
from 
(
  (
    Select 
      to_char(createdatutc,'YYYY') as "Yr",
      to_char(createdatutc,'MM') as "Mh",
      count(postid) as Freq from conversations where type = 'Post'
    group by 
      to_char(createdatutc,'YYYY'),
      to_char(createdatutc,'MM')
    having to_char(createdatutc,'YYYY') = '2018'
  ) 
  union 
  (
    Select
      to_char(createdatutc,'YYYY') as "Yr",
      to_char(createdatutc,'MM') as "Mh", 
      count(postid) as Freq 
    from conversations where type <> 'Post'
    group by 
      to_char(createdatutc,'YYYY'), 
      to_char(createdatutc,'MM') 
    having to_char(createdatutc,'YYYY') = '2018'
  )
) as t 
group by yr, Mh 
order by Yr, Mh 

which did the trick for me , Thank you guys for your help and supportenter code here

Sample data:

+ -------+--------------+------+
| postid | createdatutc | type |
+ -------+--------------+------+
|      1 | 2018-01-01   | Post |
|      2 | 2018-01-02   | Nope |
|      3 | 2018-01-03   | Njet |
|      4 | 2018-01-04   | Nada |
|      5 | 2018-02-01   | Post |
|      6 | 2018-02-02   | Post |
|      7 | 2018-02-03   | Post |
|      8 | 2018-02-04   | Nada |
|      9 | 2018-03-01   | Post |
|     10 | 2018-03-02   | Post |
|     11 | 2018-03-03   | Nope |
|     12 | 2018-03-04   | Nada |
+ -------+--------------+------+

Result:

+ -----+----+-----------+
| yr   | mh | sum(freq) |
+ -----+----+-----------+
| 2018 | 01 |         4 |
| 2018 | 02 |         4 |
| 2018 | 03 |         2 |
+ -----+----+-----------+

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM