简体   繁体   English

连接Spark和Cassandra时出错

[英]Error while connecting spark and cassandra

What I'm doing: 我在做什么:

  • Trying to connect Spark and Cassandra to retrieve data stored at cassandra tables from spark. 尝试连接Spark和Cassandra以从Spark检索存储在cassandra表中的数据。

What steps have I followed: 我遵循了哪些步骤:

  • Download cassandra 2.1.12 and spark 1.4.1 . 下载cassandra 2.1.12spark 1.4.1
  • Built spark with sudo build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean packag and sbt/sbt clean assembly 使用sudo build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean packagsbt/sbt clean assembly
  • Stored some data into cassandra. 将一些数据存储到cassandra中。
  • Downloaded these jars into spark/lib : 将这些罐子下载到spark/lib

cassandra-driver-core2.1.1.jar and spark-cassandra-connector_2.11-1.4.1.jar cassandra-driver-core2.1.1.jarspark-cassandra-connector_2.11-1.4.1.jar

Added the jar file paths to conf/spark-defaults.conf like 将jar文件路径添加到conf/spark-defaults.conf例如

spark.driver.extraClassPath \
                            ~/path/to/spark-cassandra-connector_2.11-1.4.1.jar:\
                            ~/path/to/cassandra-driver-core-2.1.1.jar

How am I running the shell: 我如何运行外壳程序:

After running ./bin/cassandra , I run spark like- 运行./bin/cassandra ,我运行spark like-

sudo ./bin/pyspark

and also tried with sudo ./bin/spark-shell 并且还尝试了sudo ./bin/spark-shell

What query am I making 我在问什么

sqlContext.read.format("org.apache.spark.sql.cassandra")\
               .options(table="users", keyspace="test")\
               .load()\
               .show()

The problem: 问题:

 java.lang.NoSuchMethodError:\
                    scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;

But org.apache.spark.sql.cassandra is present in the spark-cassandra-connecter.jar that I downloaded. 但是org.apache.spark.sql.cassandra存在于我下载的spark-cassandra-connecter.jar中。

Here is the full Log Trace 这是完整的日志跟踪

What have I tried: 我尝试了什么:

  • I tried running with the option --packages and --driver-class-path and --jars options by adding the 2 jars. 我尝试通过添加两个jar来运行--packages--driver-class-path--jars选项。
  • Tried downgrading scala to 2.1 and tried with the scala shell but still the same error. 尝试将scala降级到2.1,并尝试使用scala shell,但仍然存在相同的错误。

Questions I've been thinking about- 我一直在思考的问题-

  1. Are the versions of cassandra, spark and scala that I'm using compatible with each other? 我使用的cassandra,spark和scala版本是否彼此兼容?
  2. Am I using the correct version of the jar files? 我使用的jar文件版本正确吗?
  3. Did I compile spark in the wrong way? 我是否以错误的方式编译spark?
  4. Am I missing something or doing something wrong? 我是否缺少某些东西或做错了什么?

I'm really new to spark and cassandra so I really need some advice! 我真的是火花和卡桑德拉的新手,所以我真的需要一些建议! Been spending hours on this and probably it's something trivial. 花费了数小时的时间,这可能是微不足道的。

A few notes 一些注意事项

One you are building spark for 2.10 and using Spark Cassandra Connector libraries for 2.11. 您正在为2.10构建Spark,并为2.11使用Spark Cassandra Connector库。 To build spark for 2.11 you need to use the -Dscala-2.11 flag. 要为2.11构建-Dscala-2.11您需要使用-Dscala-2.11标志。 This is most likely the main cause of your errors. 这很可能是导致错误的主要原因。

Next to actually include the connector in your project just including the core libs without the dependencies will not be enough. 在您的项目中实际包括连接器之后,仅包含没有依赖项的核心库是不够的。 If you got past the first error you would most likely see other class not found errors from the missing deps. 如果您超过了第一个错误,则很可能会看到其他类未从缺少的部门找到错误。

This is why it's recommended to use the Spark Packages website and --packages flag. 这就是为什么建议使用Spark Packages网站和--packages标志的原因。 This will include a "fat-jar" which has all the required dependencies. 这将包括一个具有所有必需依赖项的“ fat-jar”。 See http://spark-packages.org/package/datastax/spark-cassandra-connector 参见http://spark-packages.org/package/datastax/spark-cassandra-connector

For Spark 1.4.1 and pyspark this would be 对于Spark 1.4.1和pyspark这将是

//Scala 2.10
$SPARK_HOME/bin/pyspark --packages datastax:spark-cassandra-connector:1.4.1-s_2.10
//Scala 2.11
$SPARK_HOME/bin/pyspark --packages datastax:spark-cassandra-connector:1.4.1-s_2.11

You should never have to manually download jars using the --packages method. 您永远不必使用--packages方法手动下载jar。

Do not use spark.driver.extraClassPath , it will only add the dependencies to the driver remote code will not be able to use the dependencies. 不要使用spark.driver.extraClassPath,它只会将依赖项添加到驱动程序中,远程代码将无法使用这些依赖项。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM