[英]Adding Hive Partition using Oozie
我正在使用HPD-2.4.2,並嘗試使用Oozie協調器作業將分區添加到外部Hive表中。 我創建了一個協調器,該協調器每天會觸發以下工作流程:
<workflow-app name="addPartition" xmlns="uri:oozie:workflow:0.4">
<start to="hive"/>
<action name="hive">
<hive2 xmlns="uri:oozie:hive2-action:0.1">
<job-tracker>${jobTracker}</job-tracker>
<name-node>${nameNode}</name-node>
<jdbc-url>jdbc:hive2://${jdbcPath}</jdbc-url>
<password>yarn</password>
<script>${appPath}/addPartition.q</script>
<param>nameNode=${nameNode}</param>
<param>dt=${dt}</param>
<param>path=${path}</param>
</hive2>
<ok to="end" />
<error to="fail" />
</action>
<kill name="fail">
<message>
Workflow failed, error message[${wf:errorMessage(wf:lastErrorNode())}]
</message>
</kill>
<end name="end" />
</workflow-app>
執行的腳本包含
CREATE EXTERNAL TABLE IF NOT EXISTS visits (sid BIGINT, os STRING, browser STRING, visit_time TIMESTAMP)
PARTITIONED BY (dt STRING)
STORED AS PARQUET;
ALTER TABLE visits ADD PARTITION(dt = '${dt}') LOCATION '${nameNode}/data/parquet/visitors/${path}';
如果我運行作業,則會創建表,但不會添加任何分區。 在紗線日志中,我發現:
Beeline command arguments :
-u
jdbc:hive2://localhost:10000/default
-n
yarn
-p
yarn
-d
org.apache.hive.jdbc.HiveDriver
--hivevar
nameNode=hdfs://bigdata01.local:8020
--hivevar
dt=2016-01-05
--hivevar
path=2016/01/05
-f
addPartition.q
-a
delegationToken
--hiveconf
mapreduce.job.tags=oozie-1b3b2ee664df7ac9ee436379d784955a
Fetching child yarn jobs
tag id : oozie-1b3b2ee664df7ac9ee436379d784955a
Child yarn jobs are found -
=================================================================
>>> Invoking Beeline command line now >>>
[...]
0: jdbc:hive2://localhost:10000/default> ALTER TABLE visits ADD PARTITION(dt = '${dt}') LOCATION '${nameNode}/data/parquet/visitors/${path}';
似乎未替換ALTER TABLE中的參數,要檢查這一點,我嘗試直接從CLI調用beeline:
beeline -u jdbc:hive2://localhost:10000/default -n yarn -p yarn -d org.apache.hive.jdbc.HiveDriver --hivevar nameNode=hdfs://bigdata01.local:8020 --hivevar dt="2016-01-03" --hivevar path="2016/01/03" -e "ALTER TABLE visits ADD PARTITION(dt='${dt}') LOCATION '${nameNode}/data/parquet/visitors/${path}';"
導致錯誤:
Connecting to jdbc:hive2://localhost:10000/default
Connected to: Apache Hive (version 1.2.1000.2.4.2.0-258)
Driver: Hive JDBC (version 1.2.1000.2.4.2.0-258)
Transaction isolation: TRANSACTION_REPEATABLE_READ
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. partition spec is invalid; field dt does not exist or is empty (state=08S01,code=1)
如果我運行不帶參數的alter語句
beeline -u jdbc:hive2://localhost:10000/default -n yarn -p yarn -d org.apache.hive.jdbc.HiveDriver -e "ALTER TABLE visits ADD PARTITION(dt='2016-01-03') LOCATION 'hdfs://bigdata01.local:8020/data/parquet/visitors/2016/01/03';"
或打開已定義hivevars的beeline控制台並執行alter語句
beeline -u jdbc:hive2://localhost:10000/default -n yarn -p yarn -d org.apache.hive.jdbc.HiveDriver --hivevar nameNode=hdfs://bigdata01.local:8020 --hivevar dt="2016-01-03" --hivevar path="2016/01/03"
0: jdbc:hive2://localhost:10000/default> ALTER TABLE visits ADD PARTITION(dt = '${dt}') LOCATION '${nameNode}/data/parquet/visitors/${path}';
分區已創建。
我哪里錯了?
更新:
hive2操作中的參數值在oozie.properties文件和coordinator.xml中定義
<property>
<name>nameNode</name>
<value>${nameNode}</value>
</property>
<property>
<name>dt</name>
<value>${coord:formatTime(coord:dateOffset(coord:nominalTime(), -1,'DAY'),'yyyy-MM-dd')}</value>
</property>
<property>
<name>path</name>
<value>${coord:formatTime(coord:dateOffset(coord:nominalTime(), -1,'DAY'),'yyyy/MM/dd')}</value>
</property>
在紗線日志中,您會發現
Parameters:
------------------------
nameNode=hdfs://bigdata01.local:8020
dt=2016-01-05
path=2016/01/05
在hive2動作的beeline調用中將它們設置為hivevars之前。
感謝您的幫助,但我放棄了。 我將使用ssh動作而不是hive2動作來執行帶有靜態alter語句的beeline。
<ssh xmlns="uri:oozie:ssh-action:0.1">
<host>${sshUser}@${sshHost}</host>
<command>"beeline"</command>
<args>-u</args>
<args>jdbc:hive2://localhost:10000/default</args>
<args>-n</args>
<args>yarn</args>
<args>-p</args>
<args>yarn</args>
<args>-d</args>
<args>org.apache.hive.jdbc.HiveDriver</args>
<args>-e</args>
<args>"ALTER TABLE visits ADD PARTITION(dt='${dt}') LOCATION '${nameNode}/data/raw/parquet/visitors/${path}';"</args>
<capture-output />
</ssh>
終於發現了問題。 您必須使用雙引號而不是單引號;-)
$ beeline -u jdbc:hive2://localhost:10000/default -n yarn -p yarn -d org.apache.hive.jdbc.HiveDriver --hivevar foo=bar -e "SELECT '${foo}' as foo;
+------+--+
| foo |
+------+--+
| |
+------+--+
beeline -u jdbc:hive2://localhost:10000/default -n yarn -p yarn -d org.apache.hive.jdbc.HiveDriver --hivevar foo=bar -e 'SELECT "${foo}" as foo;'
+------+--+
| foo |
+------+--+
| bar |
+------+--+
beeline -u jdbc:hive2://localhost:10000/default -n yarn -p yarn -d org.apache.hive.jdbc.HiveDriver --hivevar foo=bar -f selectFoo.q
+------+--+
| foo |
+------+--+
| bar |
+------+--+
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.