简体   繁体   English

在气流中找不到spark-submit命令

[英]spark-submit command not found in airflow

I am trying to run my spark job in airflow, when I executed this command spark-submit --class dataload.dataload_daily /home/ubuntu/airflow/dags/scripts/data_to_s3-assembly-0.1.jar in terminal, it works fine without any issue. 我正在尝试在气流中运行我的spark作业,当我在终端中执行此命令spark-submit --class dataload.dataload_daily /home/ubuntu/airflow/dags/scripts/data_to_s3-assembly-0.1.jar时,它可以正常工作而无需任何问题。

However, I am doing the same here in airflow, but keep getting the error 但是,我在气流中也做同样的事情,但是不断出现错误

/tmp/airflowtmpKQMdzp/spark-submit-scalaWVer4Z: line 1: spark-submit: command not found / tmp / airflowtmpKQMdzp / spark-submit-scalaWVer4Z:第1行:spark-submit:未找到命令

t1 = BashOperator(task_id = 'spark-submit-scala',
bash_command = 'spark-submit --class dataload.dataload_daily \
/home/ubuntu/airflow/dags/scripts/data_to_s3-assembly-0.1.jar',
dag=dag,
retries=0,
start_date=datetime(2018, 4, 14))

I have my spark path mentioned in bash_profile, 我在bash_profile中提到了我的火花路径,

export SPARK_HOME=/opt/spark-2.2.0-bin-hadoop2.7
export PATH="$SPARK_HOME/bin/:$PATH"

sourced this file as well. 以及源文件。 Not sure how to debug this, can anyone help me on this? 不知道如何调试它,有人可以帮我吗?

You could start with bash_command = 'echo $PATH' to see if your path is being updated correctly. 您可以从bash_command = 'echo $PATH'开始,以查看您的路径是否正确更新。

This is because you are metioning editing the bash_profile , but as far as I know Airflow is being run as another user. 这是因为你metioning编辑.bash_profile中 ,但据我所知气流正在运行的其他用户。 Since the other user has no changes in the bash_profile, the path to Spark might be missing. 由于其他用户的bash_profile中没有更改,因此可能缺少Spark的路径。

As mentioned here ( How do I set an environment variable for airflow to use? ) you could try setting the path in .bashrc . 如此处所述( 如何设置要使用的气流的环境变量? ),您可以尝试在.bashrc设置路径。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM