Spark中的数据类型不正确

Question

当我在spark中创建数据框时，列的类型错误。 我有100列，不知道如何最好地更改每列的数据类型。 幸运的是，大多数应该是数字。

这是我的工作：

val df = sc.textFile("user/name/testC.tsv")
# Removing the first line.
val dfLines = df.filter(x => !x.contains("test_name")
# I am picking columns I want.
val rowRDD = df.lines.map( x => x.split("\t")).map (x(2), x(4), x(11), x(12)))
# Creating a data frame.
val df = rowRDD.toDF("cycle", "dut", "metric1", "metric2")

这些列应该是数字列，但是df只有字符串：

(String, String, String, String, String, String, String, String, String, String, String, String, String) =
  (100,0,255,34,33,25,29,32,26,44,31,0,UP)

Answer 1

选择列时，可以执行转换。 例如：

val rowRDD = df.lines
  .map(x => x.split("\t"))
  .map((x(2).toInt, x(4), x(11).toDouble, x(12).toDouble))

（假设cycle为整数， dut是一个字符串，和metric1和metric2是实数。）

Spark中的数据类型不正确

问题描述

1 个解决方案

解决方案1
2 2016-05-02 18:16:47

Spark中的数据类型不正确

问题描述

1 个解决方案

解决方案1 2 2016-05-02 18:16:47

解决方案1
2 2016-05-02 18:16:47