简体   繁体   English

SQL 到 HiveQL 转换

[英]SQL to HiveQL conversion

I have this SQL query and I am trying to convert it so that it can be run on HiveQL 2.1.1.我有这个 SQL 查询,我正在尝试对其进行转换,以便它可以在 HiveQL 2.1.1 上运行。

SELECT p.id FROM page p, comments c, users u,

WHERE c.commentid= p.id 
AND u.id = p.creatorid 
AND u.upvotes IN (
    SELECT MAX(upvotes)
    FROM users u WHERE u.date > p.date
)
AND EXISTS (
    SELECT 1 FROM links l WHERE l.relid > p.id
)

This does not work on Hive QL, as it has more than 1 SubQuery (which is not supported)这不适用于 Hive QL,因为它有超过 1 个子查询(不支持)

EXISTS or IN replacements from SQL to Hive SQL are done like this:从 SQL 到 Hive SQL 的EXISTSIN替换是这样完成的:

WHERE A.aid IN (SELECT bid FROM B...)

can be replaced by:可以替换为:

A LEFT SEMI JOIN B ON aid=bid

But I can`t come up with a way to do this with the additional MAX() function.但我无法想出一种方法来使用额外的MAX() function。

Use standard join syntax instead of comma separated:使用标准连接语法而不是逗号分隔:

SELECT p.id 
FROM page p INNER JOIN
     comments c
     ON c.commentid= p.id INNER JOIN
     users u
     ON u.id = p.creatorid INNER JOIN
     links l 
     ON l.relid > p.id 
WHERE u.upvotes IN (SELECT MAX(upvotes)
                    FROM users u 
                    WHERE u.date > p.date
                   );

I am not sure what the upvotes logic is supposed to be doing.我不确定upvotes逻辑应该做什么。 The links logic is easy to handle. links逻辑很容易处理。 Hive may handle this: Hive 可以处理这个:

SELECT p.id
FROM page p JOIN
     comments c
     ON c.commentid = p.id JOIN
     users u
     ON u.id = p.creatorid CROSS JOIN
     (SELECT MAX(l.relid) as max_relid
      FROM links l
     ) l
WHERE l.max_relid > p.id AND
      u.upvotes IN (SELECT MAX(upvotes)
                    FROM users u
                    WHERE u.date > p.date
                   );

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM