Below is the query which I am trying to run on Hive with execution engine as tez.
SELECT A.CITY, A.NAME, B.PRICE
,(ROW_NUMBER() OVER (PARTITION BY A.NAME ORDER BY B.PRICE) ) AS RNUM
FROM TABLE1 A
LEFT JOIN TABLE2 B
ON A.NAME = B.NAME
WHERE ( A.COLUMN2 >= B.COLUMN3 AND A.COLUMN2 < B.COLUMN4)
GROUP BY A.CITY, A.NAME, B.PRICE;
I tried changing the data format, increasing the container size, changing the number of reducers and changing heap size. Whatever parameter I change the query is being stuck.
In my further investigation I noticed that the where condition and window function are causing the query to run infinitely.
Here is my question:
Thanks for your help
I guess this is not caused by memory allocation or reducer count. Could be caused by data skew. Analyze from that point also. This link would help: https://cwiki.apache.org/confluence/display/Hive/Skewed+Join+Optimization
The link below gives insights on skewed tables and list bucketing. Don't miss to read.
https://cwiki.apache.org/confluence/display/Hive/ListBucketing
Thanks!
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.