[英]EMR - Airflow to run scala jar file airflow.exceptions.AirflowException
我正在嘗試使用 emr 從 AIRFLOW 運行 scala jar 文件,而 jar 文件旨在讀取 mssql-jdbc 和 postgresql。
我的 SPARK_STEPS 看起來像
SPARK_STEPS = [
{
'Name': 'Trigger_Source_Target',
'ActionOnFailure': 'CONTINUE',
'HadoopJarStep': {
'Jar': 'command-runner.jar',
'Args': ['spark-submit',
'--master', 'yarn',
'--jars', '/mnt/MyScalaImport.jar',
'--class', 'org.classname',
's3://path/SNAPSHOT.jar',
'SQL_Pwd', 'PostgreSQL_PWD', 'loadtype'],
}
}
]
在此之后我定義了 JOB_FLOW_OVERRIDES-
JOB_FLOW_OVERRIDES = {
"Name": "pfdt-cluster-airflow",
"LogUri": "s3://path/elasticmapreduce/",
"ReleaseLabel": "emr-6.4.0",
"Applications": [
{"Name": "Spark"},
],
"Instances": {
"InstanceGroups": [
{
"Name": "Master nodes",
"Market": "ON_DEMAND",
"InstanceRole": "MASTER",
"InstanceType": "m5.xlarge",
"InstanceCount": 1,
}
],
"KeepJobFlowAliveWhenNoSteps": True,
"TerminationProtected": False,
'Ec2KeyName': 'pem_file_name',
"Ec2SubnetId": "subnet-123"
},
'BootstrapActions': [
{
'Name': 'import custom Jars',
'ScriptBootstrapAction': {
'Path': 's3://path/subpath/copytoolsjar.sh',
'Args': []
}
}
],
'Configurations': [
{
'Classification': 'spark-defaults',
'Properties': {
'spark.jars': 's3://jar_path/mssql-jdbc-8.4.1.jre8.jar'
}
}
],
"VisibleToAllUsers": True,
"JobFlowRole": "EMR_EC2_DefaultRole",
"ServiceRole": "EMR_DefaultRole",
"Tags": [
{"Key": "Environment", "Value": "Development"},
],
}
要將 scala.jar 文件從 S3 復制到本地到氣流-我有一個 shell 腳本可以完成工作:Path-s3://path/subpath/copytoolsjar.sh
aws s3 cp s3://path/SNAPSHOT.jar /mnt/MyScalaImport.jar
我得到的錯誤是- stdout.gz => 標准錯誤.gz =>
22/04/08 13:38:23 INFO CodeGenerator: Code generated in 25.5907 ms Exception in thread "main" java.sql.SQLException: No suitable driver at java.sql.DriverManager.getDriver(DriverManager.java:315) at org.apache.spark .sql.execution.datasources.jdbc.JDBCOptions.$anonfun$driverClass$2(JDBCOptions.scala:108) at scala.Option.getOrElse(Option.scala:189) at org.apache.spark.sql.execution.datasources.jdbc .JDBCOptions.(JDBCOptions.scala:108) 在 org.apache.spark.sql.execution.datasources.jdbc.JDBCOptions.(JDBCOptions.scala:38)
如何解決這個問題 - 我有我的 jars 在 -
s3://path/subpath/mssql-jdbc-8.4.1.jre8.jar
s3://路徑/子路徑/postgresql-42.2.24.jar
上傳jar文件(mssql-jdbc-8.4.1.jre8.jar,postgresql-42.2.24.jar)到airflow local-
在引導步驟'BootstrapActions': [ { 'Name': 'import custom Jars', 'ScriptBootstrapAction': { 'Path': 's3://path/subpath/copytoolsjar.sh', 'Args': [] } } ]
在 copytoolsjar.sh 文件中寫入命令aws s3 cp cp s3://path/SNAPSHOT.jar /mnt/MyScalaImport.jar && bash -c "sudo aws s3 cp s3://path/subpath/mssql-jdbc-8.4.1.jre8.jar /usr/lib/spark/jars/" && bash -c "sudo aws s3 cp s3://path/subpath/postgresql-42.2.24.jar /usr/lib/spark/jars/"
工作將會完成
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.