简体   繁体   English

在 session 期间何时设置 hive 参数?

[英]When to set hive parameters during a session?

I'm new to my role and part of it requires creating/inserting data into both managed and external hive tables.我是新角色,其中一部分需要在托管和外部 hive 表中创建/插入数据。 We have a few lines of 'set' parameters that we run at the beginning of a hive session, but I've run into a few cases, where, for example, the files are merged for some partitions (few number of files), but not others (many smaller files), seemingly on random days.我们在 hive session 的开头运行了几行“设置”参数,但我遇到了一些情况,例如,文件被合并用于某些分区(文件数量很少),但不是其他人(许多较小的文件),似乎是随机的日子。

My question is: when is it necessary to enter all of my Hive set parameters?我的问题是:什么时候需要输入我所有的 Hive 设置参数? Does it need to be done for every single insert/command/statement I'm running?是否需要为我正在运行的每一个插入/命令/语句完成? Or just once at the beginning of the Hive session when I've launched Hive?或者当我启动 Hive 时,仅在 Hive session 的开头一次?

These are the standard set parameters we've been using:这些是我们一直在使用的标准设置参数:

SET mapred.job.queue.name=yometrics;
SET hive.exec.dynamic.partition=true;
SET hive.exec.dynamic.partition.mode=nonstrict;
SET hive.exec.max.dynamic.partitions=2000;
SET hive.exec.max.dynamic.partitions.pernode=2000;
SET hive.merge.tezfiles=true;

You can put configuration in the beginning of the file, it will work for the whole session.您可以将配置放在文件的开头,它将适用于整个 session。

Alternatively you can put common parameters in the separate file params.hql and in each script call或者,您可以将公共参数放在单独的文件params.hql和每个脚本调用中

source /local/path/to/the/file/params.hql in the beginning. source /local/path/to/the/file/params.hql在开头。

Also you can put them in the hive-site.xml你也可以把它们放在hive-site.xml

Also you can use bootstrap for the same if you are on Qubole/AWS: https://docs.qubole.com/en/latest/user-guide/hive/bootstrap-script.html如果您在 Qubole/AWS 上,您也可以使用 bootstrap: https://docs.qubole.com/en/latest/user-guide/hive/bootstrap-script.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM