Joining two subqueries in Postgres

Question

Trying to run an inner join on two subqueries but I receive the error message:

org.postgresql.util.PSQLException: ERROR: syntax error at or near "JOIN"
  Position: 550
  GROUP BY year 
JOIN temp ON temp.year = MN.ye
^
  -- INNER JOIN (

Here is a my query

   WITH temp as(
      SELECT 
        SUM(CASE WHEN rain = 'TRUE' THEN 1 END)*1.0/COUNT(date) * 100 as rain, 
        EXTRACT(YEAR FROM date) as year 
        FROM sample
      GROUP BY year
      )
    SELECT AVG(mind) as avg_min,
        AVG(maxd) as avg_max,
        EXTRACT(YEAR FROM date) as year
      FROM sample MN
      GROUP BY year 
    JOIN temp ON temp.year = MN.year

and a sample of my data

date    prcp    maxd    mind    rain
1948-01-01 00:00:00 0.47    51  42  TRUE
1948-01-02 00:00:00 0.59    45  36  TRUE
1948-01-03 00:00:00 0.42    45  35  TRUE
1948-01-04 00:00:00 0.31    45  34  TRUE
1948-01-05 00:00:00 0.17    45  32  TRUE
1948-01-06 00:00:00 0.44    48  39  TRUE
1948-01-07 00:00:00 0.41    50  40  TRUE
1948-01-08 00:00:00 0.04    48  35  TRUE
1948-01-09 00:00:00 0.12    50  31  TRUE
1948-01-10 00:00:00 0.74    43  34  TRUE
1948-01-11 00:00:00 0.01    42  32  TRUE
1948-01-12 00:00:00 0   41  26  FALSE
1948-01-13 00:00:00 0   45  29  FALSE
1948-01-14 00:00:00 0   38  26  FALSE
1948-01-15 00:00:00 0   34  31  FALSE
1948-01-16 00:00:00 0   34  28  FALSE
1948-01-17 00:00:00 0   35  29  FALSE
1948-01-18 00:00:00 0   33  28  FALSE
1948-01-19 00:00:00 0   34  27  FALSE
1948-01-20 00:00:00 0   36  29  FALSE
1948-01-21 00:00:00 0   48  32  FALSE
1948-01-22 00:00:00 0.21    47  44  TRUE
1948-01-23 00:00:00 0   47  43  FALSE
1948-01-24 00:00:00 0.1 45  34  TRUE
1948-01-25 00:00:00 0   46  30  FALSE
1948-01-26 00:00:00 0   45  32  FALSE
1948-01-27 00:00:00 0   53  33  FALSE
1948-01-28 00:00:00 0   53  25  FALSE
1948-01-29 00:00:00 0.22    42  34  TRUE
1948-01-30 00:00:00 0.03    47  30  TRUE
1948-01-31 00:00:00 0.21    35  27  TRUE

My ideal result would be something resembling this

avg_tmin, avg_tmax, avg_rain, year
 x          x         x       1948
 x          x         x       1949
...

So the average mind(tmin), maxd(tmax) and rain for each year in my dataset

Answer 1

I don't understand the logic your query tries to implement. From your sample data and expected results, however, it looks like you just want aggregation:

select
    avg(mind) avg_mind,
    avg(maxd) avg_maxd,
    avg( (rain)::int ) avg_rain,
    extract(year from date) year
from sample
group by extract(year from date)

Answer 2

I don't see the need for a JOIN to begin with:

SELECT count(*) filter (where rain = 'TRUE') * 1.0  / count(*) as rain, 
       AVG(mind) as avg_min,
       AVG(maxd) as avg_max,
       EXTRACT(YEAR FROM date) as year 
FROM sample
GROUP BY year

The above is the most efficient way to do what you want.

However, to answer your direct question why your code doesn't work: you need to move the group by after the join and you can't use the column alias year on the same level ( mn ) where you defined it:

WITH temp as (
  SELECT count(*) filter (where rain = 'TRUE') *1.0 / COUNT(date) * 100 as rain, 
         EXTRACT(YEAR FROM date) as year 
  FROM sample
  GROUP BY year
), 
SELECT AVG(mn.mind) as avg_min,
       AVG(mn.maxd) as avg_max,
       tmp.year 
FROM sample MN
  JOIN temp ON temp.year = EXTRACT(YEAR FROM mn.date)
GROUP BY tmp.year

Note that this does not use the rain column from the CTE. If you want to add that, you either need to include it in the group by:

WITH temp as (
  SELECT count(*) filter (where rain = 'TRUE') *1.0 / COUNT(date) * 100 as rain, 
         EXTRACT(YEAR FROM date) as year 
  FROM sample
  GROUP BY year
), 
SELECT AVG(mn.mind) as avg_min,
       AVG(mn.maxd) as avg_max,
       tmp.year, 
       tmp.rain
FROM sample MN
  JOIN temp ON temp.year = EXTRACT(YEAR FROM mn.date)
GROUP BY tmp.year, tmp.rain

Or split this up in two aggregation queries that are joined.

WITH temp1 as (
  SELECT count(*) filter (where rain = 'TRUE') *1.0 / COUNT(date) * 100 as rain, 
         EXTRACT(YEAR FROM date) as year 
  FROM sample
  GROUP BY year
), temp2 as (
  SELECT AVG(mind) as avg_min,
         AVG(maxd) as avg_max,
         EXTRACT(YEAR FROM date) as year 
  FROM sample MN
  GROUP BY year 
)
select *
from temp1
  join temp2 using (year);

But again: the join is not needed and makes the whole thing less efficient.

Joining two subqueries in Postgres

Question

2 answers

solution1
2 2020-08-28 14:53:07

solution2
1 2020-08-28 14:54:14

Joining two subqueries in Postgres

Question

2 answers

solution1 2 2020-08-28 14:53:07

solution2 1 2020-08-28 14:54:14

solution1
2 2020-08-28 14:53:07

solution2
1 2020-08-28 14:54:14