简体   繁体   中英

Has anyone been successful running Apache Spark & Shark on Cassandra

I am trying to configure a 5 node cassandra cluster to run Spark/Shark to test out some Hive queries. I have installed Spark, Scala, Shark and configured according to Amplab [Running Shark on a cluster] https://github.com/amplab/shark/wiki/Running-Shark-on-a-Cluster .

I am able to get into the Shark CLI and when I try to create an EXTERNAL TABLE out of one of my Cassandra ColumnFamily tables, I keep getting this error

Failed with exception org.apache.hadoop.hive.ql.metadata.HiveException: Error in loading storage handler.org.apache.hadoop.hive.cassandra.CassandraStorageHandler

FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask

I have configured HIVE_HOME, HADOOP_HOME, SCALA_HOME. Perhaps I'm pointing HIVE_HOME and HADOOP_HOME to the wrong paths? HADOOP_HOME is set to my Cassandra hadoop folder (/etc/dse/cassandra), HIVE_HOME is set to the unpacked Amplad download of Hadoop1/hive, and I have also set HIVE_CONF_DIR to my Cassandra Hive path (/etc/dse/hive). Am I missing any steps? Or have I configured these locations wrongly? Any ideas please? Any help will be very much appreciated. Thanks

Yes, I have got it.

Try https://github.com/2013Commons/hive-cassandra

whick is working with cassandra 2.0.4, hive 0.11, hadoop 2.0

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM