簡體   English   中英

鎖定來自HiveContext的Hive表

[英]locking hive table from spark HiveContext

我們正在根據數據倉庫的需求設置配置單元,並在配置單元作為存儲時使用spark進行處理。 我們的文件很小(<10KB),但數量很大。 要求是幾乎實時地提供數據。 因此,我的方法每個都在配置單元上創建一個分區以指示其當前或過去。 保持最新發布的數據,但經過一定的時間間隔后將其聚合並移至分區PAST。 但是在進行移動操作時,我需要鎖定表,因為它可能會提供不正確的數據。

對於配置單元CLI沒有問題。

hive> LOCK TABLE t26013_75 exclusive;
OK
Time taken: 0.106 seconds

但是當我嘗試相同的火花

scala> val hiveContext = new HiveContext(sc)
16/04/07 07:14:55 INFO hive.HiveContext: Initializing execution hive, version 0.13.1
hiveContext: org.apache.spark.sql.hive.HiveContext = org.apache.spark.sql.hive.HiveContext@723fadfe

scala> hiveContext.sql("LOCK TABLE ma.t26013_75 exclusive")
16/04/07 07:15:00 INFO parse.ParseDriver: Parsing command: LOCK TABLE ma.t26013_75 exclusive
16/04/07 07:15:00 INFO parse.ParseDriver: Parse Completed
16/04/07 07:15:00 INFO hive.HiveContext: Initializing HiveMetastoreConnection version 0.13.1 using Spark classes.
16/04/07 07:15:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
16/04/07 07:15:01 INFO hive.metastore: Trying to connect to metastore with URI thrift://localhost:9083
16/04/07 07:15:01 INFO hive.metastore: Connected to metastore.
16/04/07 07:15:02 INFO session.SessionState: No Tez session required at this point. hive.execution.engine=mr.
16/04/07 07:15:02 INFO log.PerfLogger: <PERFLOG method=Driver.run from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO log.PerfLogger: <PERFLOG method=TimeToSubmit from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO ql.Driver: Concurrency mode is disabled, not creating a lock manager
16/04/07 07:15:02 INFO log.PerfLogger: <PERFLOG method=compile from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO log.PerfLogger: <PERFLOG method=parse from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO parse.ParseDriver: Parsing command: LOCK TABLE ma.t26013_75 exclusive
16/04/07 07:15:02 INFO parse.ParseDriver: Parse Completed
16/04/07 07:15:02 INFO log.PerfLogger: </PERFLOG method=parse start=1460027702353 end=1460027702784 duration=431 from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO log.PerfLogger: <PERFLOG method=semanticAnalyze from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO ql.Driver: Semantic Analysis Completed
16/04/07 07:15:02 INFO log.PerfLogger: </PERFLOG method=semanticAnalyze start=1460027702785 end=1460027702832 duration=47 from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO ql.Driver: Returning Hive schema: Schema(fieldSchemas:null, properties:null)
16/04/07 07:15:02 INFO log.PerfLogger: </PERFLOG method=compile start=1460027702328 end=1460027702841 duration=513 from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO log.PerfLogger: <PERFLOG method=Driver.execute from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO ql.Driver: Starting command: LOCK TABLE ma.t26013_75 exclusive
16/04/07 07:15:02 INFO log.PerfLogger: </PERFLOG method=TimeToSubmit start=1460027702325 end=1460027702861 duration=536 from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO log.PerfLogger: <PERFLOG method=runTasks from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO log.PerfLogger: <PERFLOG method=task.DDL.Stage-0 from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO lockmgr.DummyTxnManager: Concurrency mode is disabled, not creating a lock manager
16/04/07 07:15:02 ERROR exec.DDLTask: org.apache.hadoop.hive.ql.metadata.HiveException: lock Table LockManager not specified
    at org.apache.hadoop.hive.ql.exec.DDLTask.lockTable(DDLTask.java:2880)
    at org.apache.hadoop.hive.ql.exec.DDLTask.execute(DDLTask.java:405)
    at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:153)
    at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:85)
    at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:1503)
    at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1270)
    at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1088)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:911)
    at org.apache.hadoop.hive.ql.Driver.run(Driver.java:901)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:345)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326)
    at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155)
    at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326)
    at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316)
    at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473)
    at org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
    at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
    at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:950)
    at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:950)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:144)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:128)
    at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
    at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:755)
    at $line68.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
    at $line68.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
    at $line68.$read$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
    at $line68.$read$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
    at $line68.$read$$iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
    at $line68.$read$$iwC$$iwC$$iwC.<init>(<console>:40)
    at $line68.$read$$iwC$$iwC.<init>(<console>:42)
    at $line68.$read$$iwC.<init>(<console>:44)
    at $line68.$read.<init>(<console>:46)
    at $line68.$read$.<init>(<console>:50)
    at $line68.$read$.<clinit>(<console>)
    at $line68.$eval$.<init>(<console>:7)
    at $line68.$eval$.<clinit>(<console>)
    at $line68.$eval.$print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
    at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
    at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
    at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
    at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
    at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
    at org.apache.spark.repl.Main$.main(Main.scala:31)
    at org.apache.spark.repl.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

16/04/07 07:15:02 ERROR ql.Driver: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. lock Table LockManager not specified
16/04/07 07:15:02 INFO log.PerfLogger: </PERFLOG method=Driver.execute start=1460027702841 end=1460027702880 duration=39 from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO log.PerfLogger: <PERFLOG method=releaseLocks from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 INFO log.PerfLogger: </PERFLOG method=releaseLocks start=1460027702880 end=1460027702880 duration=0 from=org.apache.hadoop.hive.ql.Driver>
16/04/07 07:15:02 ERROR client.ClientWrapper: 
======================
HIVE FAILURE OUTPUT
======================
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. lock Table LockManager not specified

======================
END HIVE FAILURE OUTPUT
======================

org.apache.spark.sql.execution.QueryExecutionException: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. lock Table LockManager not specified
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:349)
    at org.apache.spark.sql.hive.client.ClientWrapper$$anonfun$runHive$1.apply(ClientWrapper.scala:326)
    at org.apache.spark.sql.hive.client.ClientWrapper.withHiveState(ClientWrapper.scala:155)
    at org.apache.spark.sql.hive.client.ClientWrapper.runHive(ClientWrapper.scala:326)
    at org.apache.spark.sql.hive.client.ClientWrapper.runSqlHive(ClientWrapper.scala:316)
    at org.apache.spark.sql.hive.HiveContext.runSqlHive(HiveContext.scala:473)
    at org.apache.spark.sql.hive.execution.HiveNativeCommand.run(HiveNativeCommand.scala:33)
    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult$lzycompute(commands.scala:57)
    at org.apache.spark.sql.execution.ExecutedCommand.sideEffectResult(commands.scala:57)
    at org.apache.spark.sql.execution.ExecutedCommand.doExecute(commands.scala:68)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
    at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:88)
    at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:147)
    at org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:87)
    at org.apache.spark.sql.SQLContext$QueryExecution.toRdd$lzycompute(SQLContext.scala:950)
    at org.apache.spark.sql.SQLContext$QueryExecution.toRdd(SQLContext.scala:950)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:144)
    at org.apache.spark.sql.DataFrame.<init>(DataFrame.scala:128)
    at org.apache.spark.sql.DataFrame$.apply(DataFrame.scala:51)
    at org.apache.spark.sql.SQLContext.sql(SQLContext.scala:755)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:27)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:32)
    at $iwC$$iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:34)
    at $iwC$$iwC$$iwC$$iwC$$iwC.<init>(<console>:36)
    at $iwC$$iwC$$iwC$$iwC.<init>(<console>:38)
    at $iwC$$iwC$$iwC.<init>(<console>:40)
    at $iwC$$iwC.<init>(<console>:42)
    at $iwC.<init>(<console>:44)
    at <init>(<console>:46)
    at .<init>(<console>:50)
    at .<clinit>(<console>)
    at .<init>(<console>:7)
    at .<clinit>(<console>)
    at $print(<console>)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.call(SparkIMain.scala:1065)
    at org.apache.spark.repl.SparkIMain$Request.loadAndRun(SparkIMain.scala:1338)
    at org.apache.spark.repl.SparkIMain.loadAndRunReq$1(SparkIMain.scala:840)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:871)
    at org.apache.spark.repl.SparkIMain.interpret(SparkIMain.scala:819)
    at org.apache.spark.repl.SparkILoop.reallyInterpret$1(SparkILoop.scala:857)
    at org.apache.spark.repl.SparkILoop.interpretStartingWith(SparkILoop.scala:902)
    at org.apache.spark.repl.SparkILoop.command(SparkILoop.scala:814)
    at org.apache.spark.repl.SparkILoop.processLine$1(SparkILoop.scala:657)
    at org.apache.spark.repl.SparkILoop.innerLoop$1(SparkILoop.scala:665)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$loop(SparkILoop.scala:670)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply$mcZ$sp(SparkILoop.scala:997)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop$$anonfun$org$apache$spark$repl$SparkILoop$$process$1.apply(SparkILoop.scala:945)
    at scala.tools.nsc.util.ScalaClassLoader$.savingContextLoader(ScalaClassLoader.scala:135)
    at org.apache.spark.repl.SparkILoop.org$apache$spark$repl$SparkILoop$$process(SparkILoop.scala:945)
    at org.apache.spark.repl.SparkILoop.process(SparkILoop.scala:1059)
    at org.apache.spark.repl.Main$.main(Main.scala:31)
    at org.apache.spark.repl.Main.main(Main.scala)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:665)
    at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:170)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:193)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:112)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)

我更仔細地觀看了日志並在添加以下內容后得到修復

scala> hiveContext.setConf("hive.support.concurrency","true")

我不知道為什么要問。 我已經在hive / conf位置中有了hive-site.xml。

可能是因為我在spark / conf上的hive-site.xml剛剛輸入了以下內容

<configuration>
<property>
  <name>hive.metastore.uris</name>
  <value>thrift://localhost:9083</value>
</property>

將會看到,將來可能還會在spark / conf / hive-site.xml中添加此參數

謝謝你的時間。

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM