[英]shark/spark throws NPE when querying a table
shark / spark wiki的開發部分非常簡短,因此我嘗試將代碼組合在一起,以編程方式查詢表。 這里是 ...
object Test extends App {
val master = "spark://localhost.localdomain:8084"
val jobName = "scratch"
val sparkHome = "/home/shengc/Downloads/software/spark-0.6.1"
val executorEnvVars = Map[String, String](
"SPARK_MEM" -> "1g",
"SPARK_CLASSPATH" -> "",
"HADOOP_HOME" -> "/home/shengc/Downloads/software/hadoop-0.20.205.0",
"JAVA_HOME" -> "/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64",
"HIVE_HOME" -> "/home/shengc/Downloads/software/hive-0.9.0-bin"
)
val sc = new shark.SharkContext(master, jobName, sparkHome, Nil, executorEnvVars)
sc.sql2console("create table src")
sc.sql2console("load data local inpath '/home/shengc/Downloads/software/hive-0.9.0-bin/examples/files/kv1.txt' into table src")
sc.sql2console("select count(1) from src")
}
我可以創建表src並將數據加載到src中,但最后一個查詢拋出NPE並失敗,這是輸出...
13/01/06 17:33:20 INFO execution.SparkTask: Executing shark.execution.SparkTask
13/01/06 17:33:20 INFO shark.SharkEnv: Initializing SharkEnv
13/01/06 17:33:20 INFO execution.SparkTask: Adding jar file:///home/shengc/workspace/shark/hive/lib/hive-builtins-0.9.0.jar
java.lang.NullPointerException
at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:58)
at shark.execution.SparkTask$$anonfun$execute$5.apply(SparkTask.scala:55)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:34)
at scala.collection.mutable.ArrayOps.foreach(ArrayOps.scala:38)
at shark.execution.SparkTask.execute(SparkTask.scala:55)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:134)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:57)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1326)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1118)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:951)
at shark.SharkContext.sql(SharkContext.scala:58)
at shark.SharkContext.sql2console(SharkContext.scala:84)
at Test$delayedInit$body.apply(Test.scala:20)
at scala.Function0$class.apply$mcV$sp(Function0.scala:34)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:12)
at scala.App$$anonfun$main$1.apply(App.scala:60)
at scala.App$$anonfun$main$1.apply(App.scala:60)
at scala.collection.LinearSeqOptimized$class.foreach(LinearSeqOptimized.scala:59)
at scala.collection.immutable.List.foreach(List.scala:76)
at scala.collection.generic.TraversableForwarder$class.foreach(TraversableForwarder.scala:30)
at scala.App$class.main(App.scala:60)
at Test$.main(Test.scala:4)
at Test.main(Test.scala)
FAILED: Execution Error, return code -101 from shark.execution.SparkTask13/01/06 17:33:20 ERROR ql.Driver: FAILED: Execution Error, return code -101 from shark.execution.SparkTask
13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=Driver.execute start=1357511600030 end=1357511600054 duration=24>
13/01/06 17:33:20 INFO ql.Driver: <PERFLOG method=releaseLocks>
13/01/06 17:33:20 INFO ql.Driver: </PERFLOG method=releaseLocks start=1357511600054 end=1357511600054 duration=0>
但是,我可以通過在bin / shark-withinfo調用的shell中鍵入select * from src來查詢src表
您可能會問我如何在由“bin / shark-shell”觸發的shell中嘗試該sql。 好吧,我無法進入那個殼。 這是我遇到的錯誤......
https://groups.google.com/forum/?fromgroups=#!topic/shark-users/glZzrUfabGc
[編輯1]:這個NPE似乎是由於SharkENV.sc尚未設置,所以我補充道
shark.SharkEnv.sc = sc
就在執行任何sql2console操作之前。 然后它抱怨了scala.tools.nsc的ClassNotFoundException,所以我手動將scala-compiler放在classpath中。 之后,代碼抱怨了另一個ClassNotFoundException,我無法弄清楚如何解決它,因為我確實在類路徑中放了shark jar。
13/01/06 18:09:34 INFO cluster.TaskSetManager: Lost TID 1 (task 1.0:1)
13/01/06 18:09:34 INFO cluster.TaskSetManager: Loss was due to java.lang.ClassNotFoundException: shark.execution.TableScanOperator$$anonfun$preprocessRdd$3
at java.net.URLClassLoader$1.run(URLClassLoader.java:217)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:205)
at java.lang.ClassLoader.loadClass(ClassLoader.java:321)
at java.lang.ClassLoader.loadClass(ClassLoader.java:266)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:264)
[編輯2]:好的,我想出了另一個代碼,它可以通過完全遵循shark的源代碼來完成交互式repl,從而滿足我的需求。
System.setProperty("MASTER", "spark://localhost.localdomain:8084")
System.setProperty("SPARK_MEM", "1g")
System.setProperty("SPARK_CLASSPATH", "")
System.setProperty("HADOOP_HOME", "/home/shengc/Downloads/software/hadoop-0.20.205.0")
System.setProperty("JAVA_HOME", "/usr/lib/jvm/java-1.6.0-openjdk-1.6.0.0.x86_64")
System.setProperty("HIVE_HOME", "/home/shengc/Downloads/software/hive-0.9.0-bin")
System.setProperty("SCALA_HOME", "/home/shengc/Downloads/software/scala-2.9.2")
shark.SharkEnv.initWithSharkContext("scratch")
val sc = shark.SharkEnv.sc.asInstanceOf[shark.SharkContext]
sc.sql2console("select * from src")
這很難看,但至少它有效。 歡迎任何關於如何編寫更強大的代碼的評論!
對於想要以編程方式操作鯊魚的人,請注意所有的hive和shark jar都必須在你的CLASSPATH中,並且scala編譯器也必須在你的類路徑中。 另一個重要的事情是hadoop的conf也應該在classpath中。
我相信問題是你的SharkEnv沒有初始化。 我正在使用shark 0.9.0(但我相信你必須在0.6.1中初始化SharkEnv),並且我的SharkEnv按以下方式初始化:
// SharkContext
val sc = new SharkContext(master,
jobName,
System.getenv("SPARK_HOME"),
Nil,
executorEnvVar)
// Initialize SharkEnv
SharkEnv.sc = sc
// create and populate table
sc.runSql("CREATE TABLE src(key INT, value STRING)")
sc.runSql("LOAD DATA LOCAL INPATH '${env:HIVE_HOME}/examples/files/kv1.txt' INTO TABLE src")
// print result to stdout
println(sc.runSql("select * from src"))
println(sc.runSql("select count(*) from src"))
另外,嘗試從src表中查詢數據(帶有“select count(*)...”的注釋行)而沒有聚合函數,當數據查詢正常時我有類似的問題,但是count(*)拋出異常,通過添加mysql修復在我的例子中, -connector-java.jar到yarn.application.classpath 。
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.