繁体   English   中英

Scala / Spark:当行包含类型为double的字段时,如何打印数据集的内容[行]

[英]Scala/Spark: How to print content of a dataset[row] when row consists of fields of type double

我在Scala中有一个模型课,例如:

package examples.partnerModels
import com.fasterxml.jackson.annotation.JsonProperty
case class  Temparature (@JsonProperty YEAR: Double,
                         @JsonProperty MONTH: Double,
                         @JsonProperty DAY : Double,
                         @JsonProperty MAX_TEMP: Double,
                         @JsonProperty MIN_TEMP : Double
                        )
{

  def this() = this(0,0,0,0,0)

  def getDataFields(): List[Double] =
  {
    productIterator.asInstanceOf[Iterator[Double]].toList
  }
}

object Temparature {
  def apply() = new Temparature(0,0,0,0,0)
}

我创建了一个具有这种温度模型和排序记录的数据框,并尝试以这种方式打印该数据框中每个记录的内容:

val dataRecordsTemp = sc.textFile(tempFile).map{rec=>
            val splittedRec = rec.split("\\s+")
            Temparature(
              if(isEmpty(splittedRec(0))) 0 else splittedRec(0).toDouble,
              if(isEmpty(splittedRec(1))) 0 else splittedRec(1).toDouble,
              if(isEmpty(splittedRec(2))) 0 else splittedRec(2).toDouble,
              if(isEmpty(splittedRec(3))) 0 else splittedRec(3).toDouble,
              if(isEmpty(splittedRec(4))) 0 else splittedRec(4).toDouble
            )
        }.map{x => Row.fromSeq(x.getDataFields())}

val headerFieldsForTemp = Seq("YEAR","MONTH","DAY","MAX_TEMP","MIN_TEMP")
val schemaTemp = StructType(headerFieldsForTemp.map{f => StructField(f, StringType, nullable=true)})
val dfTemp = session.createDataFrame(dataRecordsTemp,schemaTemp)
              .orderBy(desc("year"), desc("month"), desc("day"))

println("Printing temparature data ...............................")
dfTemp.show(20)

但是,我在尝试打印的那一行出现错误:

java.lang.Double is not a valid external type for schema of string

如何打印具有Double类型字段行的数据框的内容?

而不是splittedRec(i).toDouble,请使用java.lang.Double.parseDouble(splittedRec(i))

要打印行类型为double的字段的数据框的内容,您的structfields应该为DoubleType

val schemaTemp = StructType(headerFieldsForTemp.map{f => StructField(f, DoubleType, nullable=true)})

您将模式将列的类型设置为字符串,但是您为其赋予了可为null的double值(即java.lang.Double)。 考虑将schemaTemp的定义更改为:

val schemaTemp = StructType(headerFieldsForTemp.map{f => StructField(f, DoubleType, nullable=true)})

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM