Trying to run an inner join on two subqueries but I receive the error message:
org.postgresql.util.PSQLException: ERROR: syntax error at or near "JOIN"
Position: 550
GROUP BY year
JOIN temp ON temp.year = MN.ye
^
-- INNER JOIN (
Here is a my query
WITH temp as(
SELECT
SUM(CASE WHEN rain = 'TRUE' THEN 1 END)*1.0/COUNT(date) * 100 as rain,
EXTRACT(YEAR FROM date) as year
FROM sample
GROUP BY year
)
SELECT AVG(mind) as avg_min,
AVG(maxd) as avg_max,
EXTRACT(YEAR FROM date) as year
FROM sample MN
GROUP BY year
JOIN temp ON temp.year = MN.year
and a sample of my data
date prcp maxd mind rain
1948-01-01 00:00:00 0.47 51 42 TRUE
1948-01-02 00:00:00 0.59 45 36 TRUE
1948-01-03 00:00:00 0.42 45 35 TRUE
1948-01-04 00:00:00 0.31 45 34 TRUE
1948-01-05 00:00:00 0.17 45 32 TRUE
1948-01-06 00:00:00 0.44 48 39 TRUE
1948-01-07 00:00:00 0.41 50 40 TRUE
1948-01-08 00:00:00 0.04 48 35 TRUE
1948-01-09 00:00:00 0.12 50 31 TRUE
1948-01-10 00:00:00 0.74 43 34 TRUE
1948-01-11 00:00:00 0.01 42 32 TRUE
1948-01-12 00:00:00 0 41 26 FALSE
1948-01-13 00:00:00 0 45 29 FALSE
1948-01-14 00:00:00 0 38 26 FALSE
1948-01-15 00:00:00 0 34 31 FALSE
1948-01-16 00:00:00 0 34 28 FALSE
1948-01-17 00:00:00 0 35 29 FALSE
1948-01-18 00:00:00 0 33 28 FALSE
1948-01-19 00:00:00 0 34 27 FALSE
1948-01-20 00:00:00 0 36 29 FALSE
1948-01-21 00:00:00 0 48 32 FALSE
1948-01-22 00:00:00 0.21 47 44 TRUE
1948-01-23 00:00:00 0 47 43 FALSE
1948-01-24 00:00:00 0.1 45 34 TRUE
1948-01-25 00:00:00 0 46 30 FALSE
1948-01-26 00:00:00 0 45 32 FALSE
1948-01-27 00:00:00 0 53 33 FALSE
1948-01-28 00:00:00 0 53 25 FALSE
1948-01-29 00:00:00 0.22 42 34 TRUE
1948-01-30 00:00:00 0.03 47 30 TRUE
1948-01-31 00:00:00 0.21 35 27 TRUE
My ideal result would be something resembling this
avg_tmin, avg_tmax, avg_rain, year
x x x 1948
x x x 1949
...
So the average mind(tmin), maxd(tmax) and rain for each year in my dataset
I don't understand the logic your query tries to implement. From your sample data and expected results, however, it looks like you just want aggregation:
select
avg(mind) avg_mind,
avg(maxd) avg_maxd,
avg( (rain)::int ) avg_rain,
extract(year from date) year
from sample
group by extract(year from date)
I don't see the need for a JOIN to begin with:
SELECT count(*) filter (where rain = 'TRUE') * 1.0 / count(*) as rain,
AVG(mind) as avg_min,
AVG(maxd) as avg_max,
EXTRACT(YEAR FROM date) as year
FROM sample
GROUP BY year
The above is the most efficient way to do what you want.
However, to answer your direct question why your code doesn't work: you need to move the group by after the join and you can't use the column alias year
on the same level ( mn
) where you defined it:
WITH temp as (
SELECT count(*) filter (where rain = 'TRUE') *1.0 / COUNT(date) * 100 as rain,
EXTRACT(YEAR FROM date) as year
FROM sample
GROUP BY year
),
SELECT AVG(mn.mind) as avg_min,
AVG(mn.maxd) as avg_max,
tmp.year
FROM sample MN
JOIN temp ON temp.year = EXTRACT(YEAR FROM mn.date)
GROUP BY tmp.year
Note that this does not use the rain
column from the CTE. If you want to add that, you either need to include it in the group by:
WITH temp as (
SELECT count(*) filter (where rain = 'TRUE') *1.0 / COUNT(date) * 100 as rain,
EXTRACT(YEAR FROM date) as year
FROM sample
GROUP BY year
),
SELECT AVG(mn.mind) as avg_min,
AVG(mn.maxd) as avg_max,
tmp.year,
tmp.rain
FROM sample MN
JOIN temp ON temp.year = EXTRACT(YEAR FROM mn.date)
GROUP BY tmp.year, tmp.rain
Or split this up in two aggregation queries that are joined.
WITH temp1 as (
SELECT count(*) filter (where rain = 'TRUE') *1.0 / COUNT(date) * 100 as rain,
EXTRACT(YEAR FROM date) as year
FROM sample
GROUP BY year
), temp2 as (
SELECT AVG(mind) as avg_min,
AVG(maxd) as avg_max,
EXTRACT(YEAR FROM date) as year
FROM sample MN
GROUP BY year
)
select *
from temp1
join temp2 using (year);
But again: the join is not needed and makes the whole thing less efficient.
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.