I have a Hive table like this -
Name ..... Page
Sid ...........Login
Sid ...........Buy
Nancy ......Home
Nancy ......Register
Nancy ......Buy
I'd like to extract all the rows for Name where one of the Names has a Page=login. So, it would extract two rows for for name=Sid but no rows for name=Nancy.
I tried -
select * from table where name in (select name from table where page='login');
However, I get the error -
Error while compiling statement: FAILED: SemanticException [Error 10249]: Line 1:142 Unsupported SubQuery Expression ''login'': SubQuery expression refers to Outer query expressions only.
Can anyone help? This query seems simple enough. Thanks
The following query would work anywhere ANSI SQL is supported:
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT Name
FROM yourTable
GROUP BY Name
HAVING SUM(CASE WHEN Page = 'login' THEN 1 ELSE 0 END) > 0
) t2
ON t1.Name = t2.Name
The basic strategy is to do aggregation for each name, count the number of times where login
appears as a page, and then retain only those names which meet your criteria.
You can do this using window functions:
select t.*
from (select t.*,
count(case when page = 'login' then 1 else 0 end) over (partition by name) as numlogins
from t
) t
where numlogins > 0;
看一下这个: https ://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries#LanguageManualSubQueries-WHEREC子句中的子查询这是Hive SQL子查询的教程
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.