简体   繁体   English

如何在Windows中使用Scala将Cassandra与Spark连接

[英]How to connect Cassandra with Spark using Scala in windows

I am trying to connect Spark and Cassandra using Scala as described here http://www.planetcassandra.org/blog/kindling-an-introduction-to-spark-with-cassandra/ I am facing errors in the steps under the heading: 我正在尝试使用Scala将Spark和Cassandra连接起来,如下所述:http://www.planetcassandra.org/blog/kindling-an-introduction-to-spark-with-cassandra/我在标题下的步骤中遇到错误:

"To load the connector into the Spark Shell:" “要将连接器加载到Spark Shell中:”

val test_spark_rdd = sc.cassandraTable(“test_spark”, “test”) val test_spark_rdd = sc.cassandraTable(“ test_spark”,“ test”)

test_spark_rdd.first while using above command(Bold) 使用上面的命令时test_spark_rdd.first (粗体)

it shows the error Exception in task 0.0 in stage 0.0 (TID 0) java.lang.NullPointerException 在阶段0.0(TID 0)java.lang.NullPointerException中显示任务0.0中的错误Exception

i have uploaded complete stack trace here 我在这里上传了完整的堆栈跟踪

https://docs.google.com/document/d/1UjGXKifD6chq7-WrHd3GT3LoNcw8GawxAPeOtiEjKvM/edit?usp=sharing https://docs.google.com/document/d/1UjGXKifD6chq7-WrHd3GT3LoNcw8GawxAPeOtiEjKvM/edit?usp=sharing

Some rpc settings from the cassandra.YAML file are: cassandra.YAML文件中的一些rpc设置是:

rpc_address: localhost 
# rpc_interface: eth1 
# rpc_interface_prefer_ipv6: false 
# port for Thrift to listen for clients on 
rpc_port: 9160 

My spark-defaults config file 我的spark-defaults配置文件

# Default system properties included when running spark-submit.
# This is useful for setting default environmental settings.

# Example:
# spark.master                     spark://master:7077
# spark.eventLog.enabled           true
# spark.eventLog.dir               hdfs://namenode:8021/directory
#spark.serializer                 org.apache.spark.serializer.KryoSerializer
#spark.driver.memory              5g
#spark.executor.extraJavaOptions  -XX:+PrintGCDetails -#Dkey=value -Dnumbers="one two three"
spark.cassandra.connection.host localhost
15/08/04 21:24:50 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
    java.lang.NullPointerException
            at java.lang.ProcessBuilder.start(Unknown Source)
            at org.apache.hadoop.util.Shell.runCommand(Shell.java:445)
            at org.apache.hadoop.util.Shell.run(Shell.java:418)
            at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:650)
            at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:873)
            at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:853)

Looks like the issue is that the underlying forked executor process failed to start up or do something with the local filesystem. 看起来问题在于底层的分叉执行程序进程无法启动或无法对本地文件系统执行某些操作。 Make sure the default spark directories are accessible by the Executor Process. 确保执行者进程可以访问默认的spark目录。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM