Hive流和Azure Data Lake Store的问题

Question

I am writing a Play2 Java web application to ingest data to HDInsight interactive query using the Hive Streaming API( https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest ). 我正在编写一个Play2 Java Web应用程序，以使用Hive Streaming API（ https://cwiki.apache.org/confluence/display/Hive/Streaming+Data+Ingest ）将数据摄取到HDInsight交互式查询。 Hive data is stored on Azure Data Lake Store. 配置单元数据存储在Azure Data Lake Store中。

I loosely based myself on https://github.com/mradamlacey/hive-streaming-azure-hdinsight/blob/master/src/main/java/com/cbre/eim/HiveStreamingExample.java . 我大致基于https://github.com/mradamlacey/hive-streaming-azure-hdinsight/blob/master/src/main/java/com/cbre/eim/HiveStreamingExample.java 。

When I run the code on one of my headnodes I receive the following error: 当我在一个头节点上运行代码时，收到以下错误：

play.api.UnexpectedException: Unexpected exception[StreamingIOFailure: Failed creating RecordUpdaterS for adl://home/hive/warehouse/data/ingest_date=2018-05-07 txnIds[486,495]]
        at play.api.http.HttpErrorHandlerExceptions$.throwableToUsefulException(HttpErrorHandler.scala:251)
        at play.api.http.DefaultHttpErrorHandler.onServerError(HttpErrorHandler.scala:182)
        at play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:343)
        at play.core.server.AkkaHttpServer$$anonfun$2.applyOrElse(AkkaHttpServer.scala:341)
        at scala.concurrent.Future.$anonfun$recoverWith$1(Future.scala:414)
        at scala.concurrent.impl.Promise.$anonfun$transformWith$1(Promise.scala:37)
        at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:60)
        at akka.dispatch.BatchingExecutor$AbstractBatch.processBatch(BatchingExecutor.scala:55)
        at akka.dispatch.BatchingExecutor$BlockableBatch.$anonfun$run$1(BatchingExecutor.scala:91)
        at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:12)
    Caused by: org.apache.hive.hcatalog.streaming.StreamingIOFailure: Failed creating RecordUpdaterS for adl://home/hive/warehouse/data/ingest_date=2018-05-07 txnIds[486,495]
        at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.newBatch(AbstractRecordWriter.java:166)
        at org.apache.hive.hcatalog.streaming.StrictJsonWriter.newBatch(StrictJsonWriter.java:41)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:559)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint$TransactionBatchImpl.<init>(HiveEndPoint.java:512)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatchImpl(HiveEndPoint.java:397)
        at org.apache.hive.hcatalog.streaming.HiveEndPoint$ConnectionImpl.fetchTransactionBatch(HiveEndPoint.java:377)
        at hive.HiveRepository.createMany(HiveRepository.java:76)
        at controllers.HiveController.create(HiveController.java:40)
        at router.Routes$$anonfun$routes$1.$anonfun$applyOrElse$2(Routes.scala:70)
        at play.core.routing.HandlerInvokerFactory$$anon$4.resultCall(HandlerInvoker.scala:137)
    Caused by: java.io.IOException: No FileSystem for scheme: adl
        at org.apache.hadoop.fs.FileSystem.getFileSystemClass(FileSystem.java:2644)
        at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:2651)
        at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:92)
        at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:2687)
        at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:2669)
        at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:371)
        at org.apache.hadoop.fs.Path.getFileSystem(Path.java:295)
        at org.apache.hadoop.hive.ql.io.orc.OrcRecordUpdater.<init>(OrcRecordUpdater.java:233)
        at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat.getRecordUpdater(OrcOutputFormat.java:292)
        at org.apache.hive.hcatalog.streaming.AbstractRecordWriter.createRecordUpdater(AbstractRecordWriter.java:226)

I raised the question on the Microsoft forum as well and on the Hive jira . 我也在Microsoft论坛和Hive jira上提出了问题。

I can confirm that the jars described here are present in the classpath: 我可以确认在类路径中存在此处描述的jar：

com.microsoft.azure.azure-data-lake-store-sdk-2.2.5.jar
org.apache.hadoop.hadoop-azure-datalake-3.1.0.jar

Answer 1

No FileSystem for scheme 没有用于方案的文件系统

You get this error when the filesystem is not configured which probably needs to be done at both the HiveServer and your local client's core-site.xml files 如果未配置文件系统，则可能会在HiveServer和本地客户端的core-site.xml文件中进行配置，则会出现此错误

Just because the JARs exist doesn't mean they are loaded onto the classpath and configured to read from your Azure account 仅仅因为JAR存在并不意味着它们已加载到类路径上并配置为从您的Azure帐户读取

Hive流和Azure Data Lake Store的问题

问题描述

1 个解决方案

解决方案1
0 2018-05-12 15:24:06

Hive流和Azure Data Lake Store的问题

问题描述

1 个解决方案

解决方案1 0 2018-05-12 15:24:06

解决方案1
0 2018-05-12 15:24:06