简体   繁体   中英

I don't understand why my Athena (Presto) SQL query doesn't work

I have both a specific question and a more general question... for context, I'm using Athena. Here's my specific question:

1) I'm trying to write a query where I calculate the DAUs for 2 successive months broken down by country (I eventually want the DAUs/MAUs broken down by country and day but thought I'd start simple...). My thinking on how to approach this was I'd need two queries, one for each month, that used a group by on country and applied the necessary filters. Finally, I'd be able to join on country to get 1 column that contained countries, and the following columns containing the two months of DAUS. Here's the query I rustled up:

 SELECT count(distinct uuid) m1, country
FROM user u1
WHERE month(date(dt)) = 12 
AND YEAR(date(dt)) = 2018

INNER JOIN (SELECT count(distinct uuid) m2, country
  FROM user
  WHERE month(date(dt)) = 11
        AND YEAR(date(dt)) = 2018 GROUP BY country
) t ON t.country = u1.country

GROUP BY country

What's the issue with this? Error: mismatched input 'join' expecting {<eof>

2) My more general question is, how should I think when constructing more complex queries? I've programmed for years yet I seem to always find myself getting stuck with complicated SQL queries and hitting error message after error message (which also seem almost uninterpretable).

Thanks for all advice :)

The where and group by should be place after the the where join clause

    SELECT count(distinct uuid) m1, country
    FROM user u1
    INNER JOIN (SELECT count(distinct uuid) m2, country
      FROM user
      WHERE month(date(dt)) = 11
            AND YEAR(date(dt)) = 2018 GROUP BY country
    ) t ON t.country = u1.country
    WHERE month(date(dt)) = 12 
    AND YEAR(date(dt)) = 2018
    GROUP BY country

You want to join the result of two Selects:

SELECT t2.*, t.m2
FROM
 (   
   SELECT count(distinct uuid) m1, country
   FROM user u1
   WHERE month(date(dt)) = 12
     AND YEAR(date(dt)) = 2018
   GROUP BY country
 ) as t2
JOIN
 (
   SELECT count(distinct uuid) m2, country
   FROM user
   WHERE month(date(dt)) = 11
     AND YEAR(date(dt)) = 2018
   GROUP BY country
 ) as t
ON t.country = t2.country

But this will remove countries which don't have data for both months and then you have to switch to a FULL OUTER JOIN.

In your case it's much easier to use conditional aggregation :

SELECT country
  ,count(distinct case when month(date(dt)) = 12 AND YEAR(date(dt)) = 2018 then uuid end) as m1
  ,count(distinct case when month(date(dt)) = 11 AND YEAR(date(dt)) = 2018 then uuid end) as m2
FROM user
GROUP BY country

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM