[英]How to define multiple main methods in a Jar class (in Scala) and call it from Azure Data Factory?
[英]execute scala jar file in azure data factory
这是我要执行的代码:
SimpleApp.scala
package test
import java.sql.DriverManager
import org.apache.spark.SparkContext
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val conf = new SparkConf().setAppName("bouh").setMaster("yarn")
val sc = new SparkContext(conf)
val jdbcHostname = "servername.database.windows.net"
val jdbcPort = 1433
val jdbcDatabase ="database"
val jdbc_url = s"jdbc:sqlserver://${jdbcHostname}:${jdbcPort};database=${jdbcDatabase};encrypt=true;trustServerCertificate=false;hostNameInCertificate=*.database.windows.net;loginTimeout=60;"
val jdbcUsername = "user"
val jdbcPassword = "password"
val connection = DriverManager.getConnection(jdbc_url, jdbcUsername, jdbcPassword)
val statement = connection.createStatement
val rdd = sc.textFile("wasbs://dev@hdinsight.blob.core.windows.net/folder/*.txt")
rdd.collect().map(
(Id: String) => {
statement.execute(s"EXEC delete_item_by_id @Id = '${Id}'")
}
)
}
}
我使用智能IDEA(使用此链接: https ://docs.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-create-standalone-application)进行了编译。
现在,我试图在Azure数据工厂上执行它。 我创造了工作:
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Spark1",
"type": "HDInsightSpark",
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false
},
"typeProperties": {
"rootPath": "dev/apps/spikes",
"entryFilePath": "test.jar",
"className": "SimpleApp",
"sparkJobLinkedService": {
"referenceName": "linkedServiceStorageBlobHDI",
"type": "LinkedServiceReference"
}
},
"linkedServiceName": {
"referenceName": "linkedServiceHDI",
"type": "LinkedServiceReference"
}
}
]
}
}
但是执行失败并显示错误:
18/05/28 12:52:53 ERROR ApplicationMaster: Uncaught exception:
java.lang.ClassNotFoundException: SimpleApp
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:621)
at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:379)
at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:245)
at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:749)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:71)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:70)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1865)
at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:70)
at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:747)
at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
18/05/28 12:52:53 INFO ApplicationMaster: Final app status: FAILED, exitCode: 10, (reason: Uncaught exception: java.lang.ClassNotFoundException: SimpleApp)
18/05/28 12:52:53 INFO ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: Uncaught exception: java.lang.ClassNotFoundException: SimpleApp)
18/05/28 12:52:53 INFO ApplicationMaster: Deleting staging directory adl://home/user/livy/.sparkStaging/application_1527060048715_0507
18/05/28 12:52:53 INFO ShutdownHookManager: Shutdown hook called
我知道找不到该类,但是如何解决此问题? 是我的Scala脚本还是天青的工作?
编辑 :如果我打开test.jar,我有很多文件/文件夹。 我在/ test文件夹中找到了SimpleApp.class(test是我的包的名称)。 我在ADF "className": "test.SimpleApp"
上尝试过,但是仍然出现相同的错误java.lang.ClassNotFoundException: test.SimpleApp
您可以尝试打开罐子并查看SimpleApp类的路径
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.