简体   繁体   中英

Hive LLAP - ORC split generation failed

I am trying to evaluate hive LLAP on a Hortonworks HDP 2.6 cluster.

Unfortunately, I get a java.lang.RuntimeException: ORC split generation failed when trying to execute queries:

ERROR : Status: Failed
ERROR : Vertex failed, vertexName=Map 1, vertexId=vertex_1504166274656_0006_3_00, diagnostics=[Vertex vertex_1504166274656_0006_3_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: gprs_records initializer failed, vertex=vertex_1504166274656_0006_3_00 [Map 1], java.lang.RuntimeException: ORC split generation failed with exception: java.lang.ArrayIndexOutOfBoundsException: 5
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1615)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1701)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:446)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:569)
    at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 5
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1609)
    ... 15 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
    at org.apache.orc.OrcFile$WriterVersion.from(OrcFile.java:145)
    at org.apache.orc.impl.OrcTail.getWriterVersion(OrcTail.java:73)
    at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:383)
    at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:63)
    at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:89)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1419)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1305)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2600(OrcInputFormat.java:1104)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1285)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1282)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1282)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1104)
    ... 4 more
]
ERROR : Vertex killed, vertexName=Reducer 2, vertexId=vertex_1504166274656_0006_3_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1504166274656_0006_3_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]
ERROR : DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
INFO  : org.apache.tez.common.counters.DAGCounter:
INFO  :    AM_CPU_MILLISECONDS: 840
INFO  :    AM_GC_TIME_MILLIS: 23
ERROR : FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1504166274656_0006_3_00, diagnostics=[Vertex vertex_1504166274656_0006_3_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: gprs_records initializer failed, vertex=vertex_1504166274656_0006_3_00 [Map 1], java.lang.RuntimeException: ORC split generation failed with exception: java.lang.ArrayIndexOutOfBoundsException: 5
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1615)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1701)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:446)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:569)
    at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 5
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1609)
    ... 15 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
    at org.apache.orc.OrcFile$WriterVersion.from(OrcFile.java:145)
    at org.apache.orc.impl.OrcTail.getWriterVersion(OrcTail.java:73)
    at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:383)
    at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:63)
    at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:89)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1419)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1305)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2600(OrcInputFormat.java:1104)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1285)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1282)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1282)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1104)
    ... 4 more
]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1504166274656_0006_3_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1504166274656_0006_3_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1
INFO  : Resetting the caller context to HIVE_SSN_ID:2415846c-fb92-480f-b869-240a0b0f30ed
INFO  : Completed executing command(queryId=hive_20170831085623_51baf1df-5823-459e-80cf-76fa1f81789f); Time taken: 0.342 seconds
Error: Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.tez.TezTask. Vertex failed, vertexName=Map 1, vertexId=vertex_1504166274656_0006_3_00, diagnostics=[Vertex vertex_1504166274656_0006_3_00 [Map 1] killed/failed due to:ROOT_INPUT_INIT_FAILURE, Vertex Input: gprs_records initializer failed, vertex=vertex_1504166274656_0006_3_00 [Map 1], java.lang.RuntimeException: ORC split generation failed with exception: java.lang.ArrayIndexOutOfBoundsException: 5
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1615)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.getSplits(OrcInputFormat.java:1701)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.addSplitsForGroup(HiveInputFormat.java:446)
    at org.apache.hadoop.hive.ql.io.HiveInputFormat.getSplits(HiveInputFormat.java:569)
    at org.apache.hadoop.hive.ql.exec.tez.HiveSplitGenerator.initialize(HiveSplitGenerator.java:196)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:278)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable$1.run(RootInputInitializerManager.java:269)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:269)
    at org.apache.tez.dag.app.dag.RootInputInitializerManager$InputInitializerCallable.call(RootInputInitializerManager.java:253)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
    at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.concurrent.ExecutionException: java.lang.ArrayIndexOutOfBoundsException: 5
    at java.util.concurrent.FutureTask.report(FutureTask.java:122)
    at java.util.concurrent.FutureTask.get(FutureTask.java:192)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat.generateSplitsInfo(OrcInputFormat.java:1609)
    ... 15 more
Caused by: java.lang.ArrayIndexOutOfBoundsException: 5
    at org.apache.orc.OrcFile$WriterVersion.from(OrcFile.java:145)
    at org.apache.orc.impl.OrcTail.getWriterVersion(OrcTail.java:73)
    at org.apache.orc.impl.ReaderImpl.<init>(ReaderImpl.java:383)
    at org.apache.hadoop.hive.ql.io.orc.ReaderImpl.<init>(ReaderImpl.java:63)
    at org.apache.hadoop.hive.ql.io.orc.OrcFile.createReader(OrcFile.java:89)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.populateAndCacheStripeDetails(OrcInputFormat.java:1419)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.callInternal(OrcInputFormat.java:1305)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.access$2600(OrcInputFormat.java:1104)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1285)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator$1.run(OrcInputFormat.java:1282)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1866)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1282)
    at org.apache.hadoop.hive.ql.io.orc.OrcInputFormat$SplitGenerator.call(OrcInputFormat.java:1104)
    ... 4 more
]Vertex killed, vertexName=Reducer 2, vertexId=vertex_1504166274656_0006_3_01, diagnostics=[Vertex received Kill in INITED state., Vertex vertex_1504166274656_0006_3_01 [Reducer 2] killed/failed due to:OTHER_VERTEX_FAILURE]DAG did not succeed due to VERTEX_FAILURE. failedVertices:1 killedVertices:1 (state=08S01,code=2)

Here's some information to complete the picture:

  • I've enabled LLAP using the Ambari interface, installed HiveServer interactive, restarted the services. I am using Beeline version 1.2.1000.2.6.0 to connect to it.

  • The table being queried is in ORC format . ORC files are generated by an ETL pipeline and written by Java code using ORC Core . I have no problem querying it without LLAP using HiveServer2.

Any advice is really appreciated. Thank you all!

The issue seems to be related to several known issues of ORC split generation. Try running the query with the following config:

hive.exec.orc.split.strategy=BI

https://community.hortonworks.com/questions/131609/hive-llap-orc-split-generation-failed.html

Your orc file is written by OrcFileWriter with larger version. Is your hive's version 2.1.1? 1. Fix the problem by using this patch file. https://patch-diff.githubusercontent.com/raw/apache/orc/pull/75.patch 2. Recompile the hive project. 3. Replace the hive-exec-2.1.1.jar and hive-orc-2.1.1.jar in the lib directory.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM