簡體   English   中英

Hadoop Hive查詢:多連接

[英]Hadoop Hive Query: Multi-join

如何在Hive中進行子選擇? 我想我可能會犯一個非常明顯的錯誤,對我來說不是那么明顯......

我收到的錯誤: FAILED: Parse Error: line 4:8 cannot recognize input 'SELECT' in expression specification

這是我的三個源表:

aaa_hit -> [SESSION_KEY, HIT_KEY, URL]
aaa_event-> [SESSION_KEY,HIT_KEY,EVENT_ID]
aaa_session->[SESSION_KEY,REMOTE_ADDRESS]

...而我想要做的是將結果插入到結果表中,如下所示:

result -> [url, num_url, event_id, num_event_id, remote_address, num_remote_address]

...其中第1列是URL,第3列是每個URL的前1個“事件”,第5列是訪問該URL的前1個REMOTE_ADDRESS。 (甚至列是前一列的“計數”。)

Soooooo ......我在這里做錯了什么?

INSERT OVERWRITE TABLE result2
SELECT url, 
       COUNT(url) AS access_url, 
       (SELECT events.event_id as evt, 
               COUNT(events.event_id) as access_evt
        FROM   aaa_event events 
               LEFT OUTER JOIN aaa_hit hits 
                 ON ( events.hit_key = hit_key )
                 ORDER BY access_evt DESC LIMIT 1), 
       (SELECT sessions.remote_address as remote_address, 
               COUNT(sessions.remote_address) as access_addr
        FROM   aaa_session sessions 
               RIGHT OUTER JOIN aaa_hit hits 
                 ON ( sessions.session_key = session_key )
                 ORDER BY access_addr DESC LIMIT 1) 
FROM   aaa_hit
ORDER  BY access_url DESC;

非常感謝 :)

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+SubQueries

Hive僅在FROM子句中支持子查詢。

您不能將子查詢用作Hive中的“列”。

要解決此問題,您需要在FROM子句中使用該子查詢並JOIN它。 (以下不起作用,但是這個想法)

SELECT url, 
       COUNT(url) AS access_url, 
       t2.col1, t2.col2 ...
FROM   aaa_hit
JOIN (SELECT events.event_id as evt, 
               COUNT(events.event_id) as access_evt
        FROM   aaa_event events 
               LEFT OUTER JOIN aaa_hit hits 
                 ON ( events.hit_key = hit_key )
                 ORDER BY access_evt DESC LIMIT 1), 
       (SELECT sessions.remote_address as remote_address, 
               COUNT(sessions.remote_address) as access_addr
        FROM   aaa_session sessions 
               RIGHT OUTER JOIN aaa_hit hits 
                 ON ( sessions.session_key = session_key )
                 ORDER BY access_addr DESC LIMIT 1) t2
ON (aaa_hit.THING = t2.THING)

有關在Hive中使用JOIN的更多信息,請查看https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Joins

您沒有GroupBy操作,Count是一個聚合。 只有count(*)才能在沒有GroupBy子句的情況下工作。

https://cwiki.apache.org/confluence/display/Hive/LanguageManual+GroupBy

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM