简体   繁体   English

spark scala:将字符串列转换为双精度

[英]spark scala: convert string column into double

I am new to scala and spark.我是 Scala 和 Spark 的新手。 I imported a csv as below and wanted to use it in spark ML.我按如下方式导入了一个 csv,并想在 spark ML 中使用它。

scala>var data = spark.read.format("csv").load("E:\\...\\file.csv")
scala>data.show(4)
+---+---+---+----+---+---+
|_c0|_c1|_c2| _c3|_c4|_c5|
+---+---+---+----+---+---+
|  0| 30|  1| -26|  2|173|
|  3| 31|  2|-100|  3| 31|
|  1| 56|  1| -28|  1|158|
|  2| 12|  3| -49|  1| 66|
+---+---+---+----+---+---+

When assembling features, I was told datatype string is not supported.组装功能时,我被告知不支持数据类型字符串。 How can I convert these columns from string into double.如何将这些列从字符串转换为双精度。 Thanks谢谢

scala>val colArrary=Array("_c1","_c2","_c3","_c4","_c5")
scala>var assembler=new VectorAssembler().setInputCols(colArrary).setOutputCol("features")
scala>val vecDF:DataFrame=assembler.transform(data)
java.lang.IllegalArgumentException: Data type string of column _c1 is not supported.

I tried conversion like this and it worked我尝试过这样的转换,它奏效了

val colNames=Array("_c1","_c2","_c3","_c4","_c5")
 for (colName<-colNames){
     |   data=data.withColumn(colName,col(colName).cast("Double"))
     | }

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM