简体   繁体   中英

SQL to HiveQL conversion

I have this SQL query and I am trying to convert it so that it can be run on HiveQL 2.1.1.

SELECT p.id FROM page p, comments c, users u,

WHERE c.commentid= p.id 
AND u.id = p.creatorid 
AND u.upvotes IN (
    SELECT MAX(upvotes)
    FROM users u WHERE u.date > p.date
)
AND EXISTS (
    SELECT 1 FROM links l WHERE l.relid > p.id
)

This does not work on Hive QL, as it has more than 1 SubQuery (which is not supported)

EXISTS or IN replacements from SQL to Hive SQL are done like this:

WHERE A.aid IN (SELECT bid FROM B...)

can be replaced by:

A LEFT SEMI JOIN B ON aid=bid

But I can`t come up with a way to do this with the additional MAX() function.

Use standard join syntax instead of comma separated:

SELECT p.id 
FROM page p INNER JOIN
     comments c
     ON c.commentid= p.id INNER JOIN
     users u
     ON u.id = p.creatorid INNER JOIN
     links l 
     ON l.relid > p.id 
WHERE u.upvotes IN (SELECT MAX(upvotes)
                    FROM users u 
                    WHERE u.date > p.date
                   );

I am not sure what the upvotes logic is supposed to be doing. The links logic is easy to handle. Hive may handle this:

SELECT p.id
FROM page p JOIN
     comments c
     ON c.commentid = p.id JOIN
     users u
     ON u.id = p.creatorid CROSS JOIN
     (SELECT MAX(l.relid) as max_relid
      FROM links l
     ) l
WHERE l.max_relid > p.id AND
      u.upvotes IN (SELECT MAX(upvotes)
                    FROM users u
                    WHERE u.date > p.date
                   );

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM