简体   繁体   中英

AWS EMR - Run a bash script on master node

I am adding a step to an EMR cluster via Airflow using a BashOperator. In the bash command, I want to extract information about a previous Spark step. But the issue is, the previous spark step information is contained only in the master node and I have to make sure to run my current bash command in the master node. Is there any way to make sure that my command runs only on master node and not on worker nodes?

bash_cmd = \
    "steps=`aws emr add-steps  --region ap-southeast-1 --cluster-id xxxxxxxx " + \
    "--steps 'Type=CUSTOM_JAR,Name=bash_test,ActionOnFailure=CONTINUE,Jar=command-runner.jar,Args=[" + \
    "bash, " + \
    "-c, " + \
    " aws s3 cp s3://path_to_bucket_S3/userdata.sh .; chmod +x userdata.sh; ./userdata.sh]'`; "

step1 = BashOperator(
    task_id='step_1',
    bash_command=bash_cmd,
    xcom_push=True,
    dag=dag
)

Is there any way to make sure the above step/bash commands run only on master node?

Check out this from the documentation of AWS EMR: https://docs.aws.amazon.com/emr/latest/ManagementGuide/emr-plan-bootstrap.html#emr-bootstrap-runif

You can incorporate this check into your Bash command, and run it only if the current node is the master node (by pre-checking with grep isMaster /mnt/var/lib/info/instance.json )

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM