简体   繁体   中英

HiveQL Where In Clause That Points to a Set of Files

I have a set of ~100 files each with 50k IDs in them. I want to be able to make a query against Hive that has a Where In clause using the IDs from these files. I could also do this directly from Groovy, but I'm thinking the code would be cleaner if I did all of the processing from Hive instead of referencing an external Set. Is this possible?

Create an external table describing the format of your files, and set the location to the HDFS path of a directory containing the files.. ie for tab delimited files

create external table my_ids(
  id bigint,
  other_col string
)
row format delimited fields terminated by "\t"
stored as textfile
location 'hdfs://mydfs/data/myids'

Now you can use Hive to access this data.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM