Windows机器上的火花scala

Question

I am learning from the class . 我正在上课。 I have run the code as shown in the class and i get below errors. 我已经运行了类中显示的代码，我得到以下错误。 Any idea what i should do? 知道我应该怎么做吗？

I have spark 1.6.1 and Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_74) 我有火花1.6.1和Scala版本2.10.5（Java HotSpot（TM）64位服务器VM，Java 1.8.0_74）

val datadir = "C:/Personal/V2Maestros/Courses/Big Data Analytics with Spark/Scala"

//............................................................................
////   Building and saving the model
//............................................................................

val tweetData = sc.textFile(datadir + "/movietweets.csv")
tweetData.collect()

def convertToRDD(inStr : String) : (Double,String) = {
    val attList = inStr.split(",")
    val sentiment = attList(0).contains("positive") match {
            case  true => 0.0
            case  false    => 1.0
     }
    return (sentiment, attList(1))
}
val tweetText=tweetData.map(convertToRDD)
tweetText.collect()

//val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
var ttDF = sqlContext.createDataFrame(tweetText).toDF("label","text")
ttDF.show()

The error is: 错误是：

scala> ttDF.show()
[Stage 2:>                                                          (0 + 2) / 2]16/03/30 11:40:25 ERROR ExecutorClassLoader: Failed to check existence of class org.apache.spark.sql.catalyst.expressio
REPL class server at http://192.168.56.1:54595
java.net.ConnectException: Connection timed out: connect
        at java.net.TwoStacksPlainSocketImpl.socketConnect(Native Method)
       re/4729300

Answer 1

I'm no expert but the connection IP in the error message looks like a private node or even your router/modem local address. 我不是专家，但错误消息中的连接IP看起来像私有节点甚至是您的路由器/调制解调器本地地址。

As stated in the comment it could be that you're running the context with a wrong configuration that tries to spread the work to a cluster that's not there, instead of in your local jvm process. 正如评论中所述，您可能正在运行具有错误配置的上下文，该配置尝试将工作分散到不存在的集群，而不是在本地jvm进程中。

For further information you can read here and experiment with something like 有关详细信息，您可以在这里阅读并尝试类似的东西

import org.apache.spark.SparkContext

val sc = new SparkContext(master = "local[4]", appName = "tweetsClass", conf = new SparkConf)

Update 更新

Since you're using the interactive shell and the provided SparkContext available there, I guess you should pass the equivalent parameters to the shell command as in 由于您正在使用交互式shell和提供的SparkContext ，我猜您应该将等效参数传递给shell命令，如同

<your-spark-path>/bin/spark-shell --master local[4]

Which instructs the driver to assign a master for the spark cluster on the local machine, on 4 threads. 其中指示驱动程序在4个线程上为本地计算机上的spark群集分配主服务器。

Answer 2

I think the problem comes with connectivity and not from within the code. 我认为问题来自连接而不是来自代码。

Check if you can actually connect to this address and port (54595). 检查您是否可以实际连接到此地址和端口（54595）。

Answer 3

Probably your spark master is not accessible at the specified port. 可能在指定的端口无法访问您的spark master。 Use local[*] to validate using a smaller dataset and local master. 使用local [*]来验证使用较小的数据集和本地主数据。 Then, ckeck if the port is accessible or change it based on Spark port configuration ( http://spark.apache.org/docs/latest/configuration.html ) 然后，ckeck是否可以访问端口或根据Spark端口配置更改它（ http://spark.apache.org/docs/latest/configuration.html ）

Windows机器上的火花scala

问题描述

3 个解决方案

解决方案1
1 2016-04-09 01:36:49

Update 更新

解决方案2
0 2016-03-30 18:57:23

解决方案3
0 2016-04-05 15:26:42

Windows机器上的火花scala

问题描述

3 个解决方案

解决方案1 1 2016-04-09 01:36:49

Update 更新

解决方案2 0 2016-03-30 18:57:23

解决方案3 0 2016-04-05 15:26:42

解决方案1
1 2016-04-09 01:36:49

解决方案2
0 2016-03-30 18:57:23

解决方案3
0 2016-04-05 15:26:42