简体   繁体   中英

Select a large number of ids from a Hive table

I have a large table with format similar to

+-----+------+------+
|ID   |Cat   |date  |
+-----+------+------+
|12   | A    |201602|
|14   | B    |201601|
|19   | A    |201608|
|12   | F    |201605|
|11   | G    |201603|
+-----+------+------+

and I need to select entries based on a list with around 5000 thousand IDs. The straighforward way would be to use the list as a WHERE clause but that would have a really bad performance and probably it even would not work. How can I do this selection?

Using a partitioned table things run fast. Once you partitioned the table add your ids into the where. You can also extract a subtable from the original one selecting all the rows which have their ids between the min and the max of you ids list.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM