简体   繁体   English

Hive流和Azure Data Lake Store的问题

[英]Issue with Hive streaming and Azure Data Lake Store

I am writing a Play2 Java web application to ingest data to HDInsight interactive query using the Hive Streaming API( https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest ). 我正在编写一个Play2 Java Web应用程序,以使用Hive Streaming API( https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest )将数据摄取到HDInsight交互式查询。 Hive data is stored on Azure Data Lake Store. 配置单元数据存储在Azure Data Lake Store中。

I loosely based myself on https://github.com/mradamlacey/hive-streaming-azure-hdinsight/blob/master/src/main/java/com/cbre/eim/HiveStreamingExample.java . 我大致基于https://github.com/mradamlacey/hive-streaming-azure-hdinsight/blob/master/src/main/java/com/cbre/eim/HiveStreamingExample.java

When I run the code on one of my headnodes I receive the following error: 当我在一个头节点上运行代码时,收到以下错误:

play.api.UnexpectedException: Unexpected exception[StreamingIOFailure: Failed creating RecordUpdaterS for adl://home/hive/warehouse/data/ingest_date=2018-05-07 txnIds[486,495]]
        at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:251)
        at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:182)
        at play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:343)
        at play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:341)
        at scala.concurrent.Future.$anonfun$recoverWith$1(Future.scala:414)
        at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:37)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
        at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
        at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
    Caused by: org.apache.hive.hcatalog.streaming.StreamingIOFailure: Failed creating RecordUpdaterS for adl://home/hive/warehouse/data/ingest_date=2018-05-07 txnIds[486,495]
        at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.newBatch(AbstractRecordWriter.java:166)
        at org.apache.hive.hcatalog.streaming.StrictJsonWriter.newBatch(StrictJsonWriter.java:41)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:559)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:512)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatchImpl(HiveEndPoint.java:397)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatch(HiveEndPoint.java:377)
        at hive.HiveRepository.createMany(HiveRepository.java:76)
        at controllers.HiveController.create(HiveController.java:40)
        at router.Routes$$anonfun$routes$1.$anonfun$applyOrElse$2(Routes.scala:70)
        at play.core.routing.HandlerInvokerFactory$$anon$4.resultCall(HandlerInvoker.scala:137)
    Caused by: java.io.IOException: No FileSystem for scheme: adl
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2644)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.<init>(OrcRecordUpdater.java:233)
        at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordUpdater(OrcOutputFormat.java:292)
        at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.createRecordUpdater(AbstractRecordWriter.java:226)

I raised the question on the Microsoft forum as well and on the Hive jira . 我也在Microsoft论坛Hive jira上提出了问题。

I can confirm that the jars described here are present in the classpath: 我可以确认在类路径中存在此处描述的jar:

com.microsoft.azure.azure-data-lake-store-sdk-2.2.5.jar
org.apache.hadoop.hadoop-azure-datalake-3.1.0.jar

No FileSystem for scheme 没有用于方案的文件系统

You get this error when the filesystem is not configured which probably needs to be done at both the HiveServer and your local client's core-site.xml files 如果未配置文件系统,则可能会在HiveServer和本地客户端的core-site.xml文件中进行配置则会出现此错误

Just because the JARs exist doesn't mean they are loaded onto the classpath and configured to read from your Azure account 仅仅因为JAR存在并不意味着它们已加载到类路径上并配置为从您的Azure帐户读取

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM