简体   繁体   中英

Scala Spark / Shark: How to access existing Hive tables in Hortonworks?

I am trying to find some docs / description of the approach on the subject, please help. I have Hadoop 2.2.0 from Hortonworks installed with some existing Hive tables I need to query. Hive SQL works extremly and unreasonably slow on single node and cluster as well. I hope Shark will work faster.

From Spark/Shark docs I can not figure out how to make Shark work with existing Hive tables. Any ideas how to achieve this? Thanks!

You need to configure the metastore within the shark-specific hive directory. Details are provided at a similar question I answered here .

In summary, you will need to copy the hive-default.xml to hive-site.xml . Then ensure the metastore properties are set.

Here is the basic info in hive-site.xml

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://myhost/metastore</value>
  <description>the URL of the MySQL database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>mypassword</value>
</property>

You can get more details here: configuring hive metastore

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM