[英]SQL to HiveQL conversion
I have this SQL query and I am trying to convert it so that it can be run on HiveQL 2.1.1.我有这个 SQL 查询,我正在尝试对其进行转换,以便它可以在 HiveQL 2.1.1 上运行。
SELECT p.id FROM page p, comments c, users u,
WHERE c.commentid= p.id
AND u.id = p.creatorid
AND u.upvotes IN (
SELECT MAX(upvotes)
FROM users u WHERE u.date > p.date
)
AND EXISTS (
SELECT 1 FROM links l WHERE l.relid > p.id
)
This does not work on Hive QL, as it has more than 1 SubQuery (which is not supported)这不适用于 Hive QL,因为它有超过 1 个子查询(不支持)
EXISTS
or IN
replacements from SQL to Hive SQL are done like this:从 SQL 到 Hive SQL 的
EXISTS
或IN
替换是这样完成的:
WHERE A.aid IN (SELECT bid FROM B...)
can be replaced by:可以替换为:
A LEFT SEMI JOIN B ON aid=bid
But I can`t come up with a way to do this with the additional MAX() function.但我无法想出一种方法来使用额外的MAX() function。
Use standard join syntax instead of comma separated:使用标准连接语法而不是逗号分隔:
SELECT p.id
FROM page p INNER JOIN
comments c
ON c.commentid= p.id INNER JOIN
users u
ON u.id = p.creatorid INNER JOIN
links l
ON l.relid > p.id
WHERE u.upvotes IN (SELECT MAX(upvotes)
FROM users u
WHERE u.date > p.date
);
I am not sure what the upvotes
logic is supposed to be doing.我不确定
upvotes
逻辑应该做什么。 The links
logic is easy to handle. links
逻辑很容易处理。 Hive may handle this: Hive 可以处理这个:
SELECT p.id
FROM page p JOIN
comments c
ON c.commentid = p.id JOIN
users u
ON u.id = p.creatorid CROSS JOIN
(SELECT MAX(l.relid) as max_relid
FROM links l
) l
WHERE l.max_relid > p.id AND
u.upvotes IN (SELECT MAX(upvotes)
FROM users u
WHERE u.date > p.date
);
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.