简体   繁体   English

找到Scala Spark类型不匹配的单位,必需为rdd.RDD

[英]Scala Spark type missmatch found Unit, required rdd.RDD

I am reading a table from a MySQL database in a spark project written in scala. 我正在用scala编写的spark项目中从MySQL数据库读取表。 It s my first week on it so I am really not so fit. 这是我的第一个礼拜,所以我真的不太适应。 When I am trying to run 当我试图跑步时

  val clusters = KMeans.train(parsedData, numClusters, numIterations)

I am getting an error for parsedData that says:"type mismatch; found : org.apache.spark.rdd.RDD[Map[String,Any]] required: org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector]" 我收到parsedData的错误消息:“类型不匹配;找到:org.apache.spark.rdd.RDD [Map [String,Any]]必需:org.apache.spark.rdd.RDD [org.apache.spark .mllib.linalg.Vector]”

My parsed data is created above like this: 我的解析数据是像上面这样创建的:

 val parsedData = dataframe_mysql.map(_.getValuesMap[Any](List("name", "event","execution","info"))).collect().foreach(println)

where dataframe_mysql is the whatever is returned from sqlcontext.read.format("jdbc").option(....) function. 其中sqlcontext.read.format("jdbc").option(....) function.是从sqlcontext.read.format("jdbc").option(....) function.

How am I supposed to convert my unit to fit the requirements to pass it in the train function? 我应该如何转换我的单元以使其符合在火车功能中通过的要求?

According to documentation I am supposed to use something like this: 根据文档,我应该使用这样的东西:

data.map(s => Vectors.dense(s.split(' ').map(_.toDouble))).cache()

Am I supposed to transform my values to double? 我应该将自己的价值观翻倍吗? because when I try to run the command above my project will crash. 因为当我尝试运行上面的命令时,我的项目将崩溃。

thank you! 谢谢!

Remove the trailing .collect().foreach(println) . 删除尾随的.collect().foreach(println) After calling collect , you no longer have an RDD - it just turns into a local collection. 调用collect ,您将不再拥有RDD-它只是变成了本地集合。

Subsequently, when you call foreach it returns Unit - foreach is for doing side-effects like printing each element in a collection. 随后,当您调用foreach它返回Unit -foreach用于产生副作用,例如打印集合中的每个元素。 etc. 等等

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

相关问题 通过查询mysql来发火花rdd - spark rdd fliter by query mysql 交易区块 | Spark SQL, rdd - Transactional block | Spark SQL, rdd 如何从Spark ignite RDD启用对永久存储的直写 - How to enable writethrough to persistent storage from Spark ignite RDD 如何将 Spark RDD 中的数据放到 Mysql 表中 - How to put data from Spark RDD to Mysql Table 类型不匹配错误到经典ASP中的ceil函数 - type missmatch error to ceil function in classic asp () 中的字段 personRepository 需要一个 () 类型的 bean,但找不到 - Field personRepositary in () required a bean of type () that could not be found B类中构造函数的参数0需要一个找不到A类类型的Bean - Parameter 0 of constructor in Class B required a bean of type Class A that could not be found Controller.Controller 中的错误字段存储库需要找不到类型为“Controller.Repository”的 bean - Error Field Repository in Controller.Controller required a bean of type 'Controller.Repository' that could not be found 当使用foreachPartition将rdd中的数据写入mysql时,我偶尔会丢失mysql连接 - When use foreachPartition to write data in rdd into mysql , i lost mysql connection occasionally Spark scala 加入带有限制的子查询 - Spark scala joining with subquery with limit
 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM