[英]Error while connecting spark and cassandra
What I'm doing: 我在做什么:
What steps have I followed: 我遵循了哪些步骤:
sudo build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean packag
and sbt/sbt clean assembly
sudo build/mvn -Pyarn -Phadoop-2.4 -Dhadoop.version=2.4.0 -DskipTests clean packag
和sbt/sbt clean assembly
spark/lib
: spark/lib
: cassandra-driver-core2.1.1.jar and spark-cassandra-connector_2.11-1.4.1.jar cassandra-driver-core2.1.1.jar和spark-cassandra-connector_2.11-1.4.1.jar
Added the jar file paths to conf/spark-defaults.conf
like 将jar文件路径添加到
conf/spark-defaults.conf
例如
spark.driver.extraClassPath \
~/path/to/spark-cassandra-connector_2.11-1.4.1.jar:\
~/path/to/cassandra-driver-core-2.1.1.jar
How am I running the shell: 我如何运行外壳程序:
After running ./bin/cassandra
, I run spark like- 运行
./bin/cassandra
,我运行spark like-
sudo ./bin/pyspark
and also tried with sudo ./bin/spark-shell
并且还尝试了
sudo ./bin/spark-shell
What query am I making 我在问什么
sqlContext.read.format("org.apache.spark.sql.cassandra")\
.options(table="users", keyspace="test")\
.load()\
.show()
The problem: 问题:
java.lang.NoSuchMethodError:\
scala.Predef$.$conforms()Lscala/Predef$$less$colon$less;
But org.apache.spark.sql.cassandra
is present in the spark-cassandra-connecter.jar that I downloaded. 但是
org.apache.spark.sql.cassandra
存在于我下载的spark-cassandra-connecter.jar中。
Here is the full Log Trace 这是完整的日志跟踪
What have I tried: 我尝试了什么:
--packages
and --driver-class-path
and --jars
options by adding the 2 jars. --packages
和--driver-class-path
和--jars
选项。 Questions I've been thinking about- 我一直在思考的问题-
I'm really new to spark and cassandra so I really need some advice! 我真的是火花和卡桑德拉的新手,所以我真的需要一些建议! Been spending hours on this and probably it's something trivial.
花费了数小时的时间,这可能是微不足道的。
A few notes 一些注意事项
One you are building spark for 2.10 and using Spark Cassandra Connector libraries for 2.11. 您正在为2.10构建Spark,并为2.11使用Spark Cassandra Connector库。 To build spark for 2.11 you need to use the
-Dscala-2.11
flag. 要为2.11构建
-Dscala-2.11
您需要使用-Dscala-2.11
标志。 This is most likely the main cause of your errors. 这很可能是导致错误的主要原因。
Next to actually include the connector in your project just including the core libs without the dependencies will not be enough. 在您的项目中实际包括连接器之后,仅包含没有依赖项的核心库是不够的。 If you got past the first error you would most likely see other class not found errors from the missing deps.
如果您超过了第一个错误,则很可能会看到其他类未从缺少的部门找到错误。
This is why it's recommended to use the Spark Packages website and --packages
flag. 这就是为什么建议使用Spark Packages网站和
--packages
标志的原因。 This will include a "fat-jar" which has all the required dependencies. 这将包括一个具有所有必需依赖项的“ fat-jar”。 See http://spark-packages.org/package/datastax/spark-cassandra-connector
参见http://spark-packages.org/package/datastax/spark-cassandra-connector
For Spark 1.4.1 and pyspark this would be 对于Spark 1.4.1和pyspark这将是
//Scala 2.10
$SPARK_HOME/bin/pyspark --packages datastax:spark-cassandra-connector:1.4.1-s_2.10
//Scala 2.11
$SPARK_HOME/bin/pyspark --packages datastax:spark-cassandra-connector:1.4.1-s_2.11
You should never have to manually download jars using the --packages
method. 您永远不必使用
--packages
方法手动下载jar。
Do not use spark.driver.extraClassPath , it will only add the dependencies to the driver remote code will not be able to use the dependencies. 不要使用spark.driver.extraClassPath,它只会将依赖项添加到驱动程序中,远程代码将无法使用这些依赖项。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.