简体   繁体   中英

Hive SQL - select all rows containing a value; where one of the rows matches a specific value

I have a Hive table like this -

Name     ..... Page
Sid ...........Login
Sid ...........Buy 
Nancy ......Home
Nancy ......Register
Nancy ......Buy

I'd like to extract all the rows for Name where one of the Names has a Page=login. So, it would extract two rows for for name=Sid but no rows for name=Nancy.

I tried -

select * from table where name in (select name from table where page='login');

However, I get the error -

Error while compiling statement: FAILED: SemanticException [Error 10249]: Line 1:142 Unsupported SubQuery Expression ''login'': SubQuery expression refers to Outer query expressions only.

Can anyone help? This query seems simple enough. Thanks

The following query would work anywhere ANSI SQL is supported:

SELECT t1.*
FROM yourTable t1
INNER JOIN
(
    SELECT Name
    FROM yourTable
    GROUP BY Name
    HAVING SUM(CASE WHEN Page = 'login' THEN 1 ELSE 0 END) > 0
) t2
    ON t1.Name = t2.Name

The basic strategy is to do aggregation for each name, count the number of times where login appears as a page, and then retain only those names which meet your criteria.

You can do this using window functions:

select t.*
from (select t.*,
             count(case when page = 'login' then 1 else 0 end) over (partition by name) as numlogins
      from t
     ) t
where numlogins > 0;

看一下这个: https ://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries#LanguageManualSubQueries-WHEREC子句中的子查询这是Hive SQL子查询的教程

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM