简体   繁体   中英

Azure data factory HDInsight on-demand cluster 'Unable to instantiate SessionHiveMetaStoreClient'

I'm deploying an Azure data factory by deploying an ARM template using Visual Studio, basically following this Azure tutorial exactly, step by step.

The template defines a data factory, with an Azure Storage linked service (for reading and writing source and output data), an input dataset and an output data set, an HDInsight on-demand linked service, and a pipeline which runs an HDInsight HIVE activity to run a HIVE script which processes the input datasets into an output data set.

Everything deploys successfully and the pipeine activity starts. However I get the following error from the activity:

Exception in thread "main" java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:445) at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:675) at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:619) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

I have found various posts such as this one and this one suggesting the porblem is a known bug caused by a dash or hyphen in the HIVE metastore database name.

My problem is that using an ARM template to deploy an HDInsigh cluster on demand, I have no access to the cluster itself, so I can't make any manual config changes (the idea of on-demand is that it is transient, only created to serve a set of demands and then deletes itself).

The issue can be reproduced easily simply by following the tutorial step by step .

The only possible glimmer of hope I have found is by setting the hcatalogLinkedServiceName as documented here , which is designed to allow you to use your own Azure SQL database as the hive metastore. However, this doesn't work either - if I use that property, I get:

'JamesTestTutorialARMDataFactory/HDInsightOnDemandLinkedService' failed with message ' HCatalog integration is not enabled for this subscription. '

My subscription is unrestricted, and should have all the features of Azure available. So now I'm completely stuck. It seems that currently, using Hive with on-demand HDInsight is basically impossible?

If anyone can think of anything to try, I'm all ears!

Thanks

I recently studied the tutorial, and revised the tutorial. Here is my version, https://azure.microsoft.com/en-us/documentation/articles/hdinsight-hadoop-create-linux-clusters-adf/ . I didn't see the error. The Hive table name doesn't have hyphen. I think mine is a bit easier to follow. I made a few minor changes on the template itself.

I managed to get in contact with the author of the tutorial on GIT - who contacted the Azure product team, and this was their response:

...this is a known issue with Linux based HDI clusters when you use them with ADF. The HDI team has fixed the issue and will be deploying the fix in the coming weeks. In the meantime, you have to use Window based HDI clusters with ADF.

Please use windows as osType for now. I have fixed the article in GIT and it will go live some time tomorrow.

The tutorial I link to has indeed been changed to use windows instead of linux. I've tried this and it works.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM