简体   繁体   中英

How does Hive CLI retrieve huge result files from HDFS?

After I execute a hive query via CLI like below:

$ hive -e QUERY > output.txt
  1. Hive client will compile the QUERY and send it to Hadoop cluster.
  2. Hadoop executes some jobs and outputs result to a file (assume only 1 reducer) at HDFS.
  3. Then Hive client will retrieve this single file, extract it, and output to local STDOUT.

The flow looks like below graph:

==============
Hadoop Cluster
==============
  |         |
  |         |
  |     2. output RESULT as a single .gz file at HDFS because of 1 reducer
  |         |
  |         |
1. QUERY    |
  |         |
  |     3. Hive retrieves the RESULT as stream or a whole file ?
  |        If as a whole file, what happens when file size > memory size ?
  |         |
  |         |
  ===========
  Hive Client
  ===========
      |
      |
  4. Client outputs RESULT to stdout which is redirected to a file
      |
      |
 ===========
 Output File
 ===========

My question is: If the single result file at HDFS is super big, even bigger than my local physical memory size, how does Hive client handle it ?

Does Hive client retrieve the file

  1. as a stream ?
  2. put it to some temporary swap file ?
  3. or something else ?

You are getting the results as a stream, so if you haven't redirected the output, no temporary files are included in your procedure. You could imagine it as doing hadoop fs -cat /THE/RESULT/FILE/OF/YOUR/HIVE/REQUEST

If the result will be a large data, you could re put them on an hdfs location :

$ hive -e QUERY | hadoop fs -put - /HDFS/LOCATION

But here you should pay attention to the network as it might be saturated

Another alternative is to store the data immidiately to another Hive table, in this way Hive will do all the job for you and no reuslts will be streamed/copied to your local machine

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM