简体   繁体   English

Windows机器上的火花scala

[英]spark scala on windows machine

I am learning from the class . 我正在上课 I have run the code as shown in the class and i get below errors. 我已经运行了类中显示的代码,我得到以下错误。 Any idea what i should do? 知道我应该怎么做吗?

I have spark 1.6.1 and Scala version 2.10.5 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_74) 我有火花1.6.1和Scala版本2.10.5(Java HotSpot(TM)64位服务器VM,Java 1.8.0_74)

val datadir = "C:/Personal/V2Maestros/Courses/Big Data Analytics with Spark/Scala"

//............................................................................
////   Building and saving the model
//............................................................................

val tweetData = sc.textFile(datadir + "/movietweets.csv")
tweetData.collect()

def convertToRDD(inStr : String) : (Double,String) = {
    val attList = inStr.split(",")
    val sentiment = attList(0).contains("positive") match {
            case  true => 0.0
            case  false    => 1.0
     }
    return (sentiment, attList(1))
}
val tweetText=tweetData.map(convertToRDD)
tweetText.collect()

//val sqlContext = new org.apache.spark.sql.SQLContext(sc)
import sqlContext.implicits._
var ttDF = sqlContext.createDataFrame(tweetText).toDF("label","text")
ttDF.show()

The error is: 错误是:

scala> ttDF.show()
[Stage 2:>                                                          (0 + 2) / 2]16/03/30 11:40:25 ERROR ExecutorClassLoader: Failed to check existence of class org.apache.spark.sql.catalyst.expressio
REPL class server at http://192.168.56.1:54595
java.net.ConnectException: Connection timed out: connect
        at java.net.TwoStacksPlainSocketImpl.socketConnect(Native Method)
       re/4729300

I'm no expert but the connection IP in the error message looks like a private node or even your router/modem local address. 我不是专家,但错误消息中的连接IP看起来像私有节点甚至是您的路由器/调制解调器本地地址。

As stated in the comment it could be that you're running the context with a wrong configuration that tries to spread the work to a cluster that's not there, instead of in your local jvm process. 正如评论中所述,您可能正在运行具有错误配置的上下文,该配置尝试将工作分散到不存在的集群,而不是在本地jvm进程中。

For further information you can read here and experiment with something like 有关详细信息,您可以在这里阅读并尝试类似的东西

import org.apache.spark.SparkContext

val sc = new SparkContext(master = "local[4]", appName = "tweetsClass", conf = new SparkConf)

Update 更新

Since you're using the interactive shell and the provided SparkContext available there, I guess you should pass the equivalent parameters to the shell command as in 由于您正在使用交互式shell和提供的SparkContext ,我猜您应该将等效参数传递给shell命令,如同

<your-spark-path>/bin/spark-shell --master local[4]

Which instructs the driver to assign a master for the spark cluster on the local machine, on 4 threads. 其中指示驱动程序在4个线程上为本地计算机上的spark群集分配主服务器。

I think the problem comes with connectivity and not from within the code. 我认为问题来自连接而不是来自代码。

Check if you can actually connect to this address and port (54595). 检查您是否可以实际连接到此地址和端口(54595)。

Probably your spark master is not accessible at the specified port. 可能在指定的端口无法访问您的spark master。 Use local[*] to validate using a smaller dataset and local master. 使用local [*]来验证使用较小的数据集和本地主数据。 Then, ckeck if the port is accessible or change it based on Spark port configuration ( http://spark.apache.org/docs/latest/configuration.html ) 然后,ckeck是否可以访问端口或根据Spark端口配置更改它( http://spark.apache.org/docs/latest/configuration.html

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM