简体   繁体   English

sqoop作业shell脚本在oozie中并行执行

[英]sqoop job shell script execute parallel in oozie

I have a shell script which executes sqoop job . 我有一个执行sqoop job的shell脚本。 The script is below. 该脚本如下。

!#/bin/bash

table=$1

sqoop job --exec ${table}

Now when I pass the table name in the workflow I get the sqoop job to be executed successfully. 现在,当我在工作流程中传递表名时,我将获得成功执行的sqoop作业。

The workflow is below. 工作流程如下。

<workflow-app name="Shell_script" xmlns="uri:oozie:workflow:0.5">
<start to="shell"/>
<kill name="Kill">
    <message>Action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
</kill>
<action name="shell_script">
    <shell xmlns="uri:oozie:shell-action:0.1">
        <job-tracker>${jobTracker}</job-tracker>
        <name-node>${nameNode}</name-node>
        <exec>sqoopjob.sh</exec>
        <argument>test123</argument>
        <file>/user/oozie/sqoop/lib/sqoopjob.sh#sqoopjob.sh</file>
    </shell>
    <ok to="End"/>
    <error to="Kill"/>
    </action>
    <end name="End"/>
</workflow-app>

The job executes successfully for table test123 . test123的作业成功执行。

Now I have 300 sqoop jobs same like above. 现在我有300个像上面一样的工作。 I want to execute 10 sqoop jobs in parallel. 我想并行执行10个sqoop作业。 All the table names are in a single file. 所有表名都在一个文件中。

Now I want to loop to the file and execute 10 sqoop jobs for first 10 tables and so on. 现在,我想循环到该文件并为前10个表执行10个sqoop作业,依此类推。

How can I do this? 我怎样才能做到这一点? should I prepare 10 workflows? 我应该准备10个工作流程吗? I am literally confused. 我真的很困惑。

As @ Samson Scharfrichter mentioned you can start parallel jobs in the shell script. 正如@ Samson Scharfrichter提到的,您可以在shell脚本中启动并行作业。 Make a function runJob() in shell and run it in parallel. 在shell中创建一个函数runJob()并并行运行它。 Use this template: 使用此模板:

#!/bin/bash

runJob() {
tableName="$1"
#add other parameters here

#call sqoop here or do something else
#write command logs
#etc, etc
#return 0 on success, return 1 on fail

return 0
}

#Run parallel processes and wait for their completion

#Add loop here or add more calls
runJob $table_name &
runJob $table_name2 &
runJob $table_name3 &
#Note the ampersand in above commands says to create parallel process

#Now wait for all processes to complete
FAILED=0

for job in `jobs -p`
do
   echo "job=$job"
   wait $job || let "FAILED+=1"
done

if [ "$FAILED" != "0" ]; then
    echo "Execution FAILED!  ($FAILED)"
    #Do something here, log or send messege, etc

    exit 1
fi

#All processes are completed successfully!
#Do something here
echo "Done successfully"

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM