[英]Spark (Scala): How to turn an Array[Row] into either a DataSet[Row] or a DataFrame?
[英]Scala/Spark: How to print content of a dataset[row] when row consists of fields of type double
我在Scala中有一个模型课,例如:
package examples.partnerModels
import com.fasterxml.jackson.annotation.JsonProperty
case class Temparature (@JsonProperty YEAR: Double,
@JsonProperty MONTH: Double,
@JsonProperty DAY : Double,
@JsonProperty MAX_TEMP: Double,
@JsonProperty MIN_TEMP : Double
)
{
def this() = this(0,0,0,0,0)
def getDataFields(): List[Double] =
{
productIterator.asInstanceOf[Iterator[Double]].toList
}
}
object Temparature {
def apply() = new Temparature(0,0,0,0,0)
}
我创建了一个具有这种温度模型和排序记录的数据框,并尝试以这种方式打印该数据框中每个记录的内容:
val dataRecordsTemp = sc.textFile(tempFile).map{rec=>
val splittedRec = rec.split("\\s+")
Temparature(
if(isEmpty(splittedRec(0))) 0 else splittedRec(0).toDouble,
if(isEmpty(splittedRec(1))) 0 else splittedRec(1).toDouble,
if(isEmpty(splittedRec(2))) 0 else splittedRec(2).toDouble,
if(isEmpty(splittedRec(3))) 0 else splittedRec(3).toDouble,
if(isEmpty(splittedRec(4))) 0 else splittedRec(4).toDouble
)
}.map{x => Row.fromSeq(x.getDataFields())}
val headerFieldsForTemp = Seq("YEAR","MONTH","DAY","MAX_TEMP","MIN_TEMP")
val schemaTemp = StructType(headerFieldsForTemp.map{f => StructField(f, StringType, nullable=true)})
val dfTemp = session.createDataFrame(dataRecordsTemp,schemaTemp)
.orderBy(desc("year"), desc("month"), desc("day"))
println("Printing temparature data ...............................")
dfTemp.show(20)
但是,我在尝试打印的那一行出现错误:
java.lang.Double is not a valid external type for schema of string
如何打印具有Double类型字段行的数据框的内容?
而不是splittedRec(i).toDouble,请使用java.lang.Double.parseDouble(splittedRec(i))
要打印行类型为double的字段的数据框的内容,您的structfields应该为DoubleType
val schemaTemp = StructType(headerFieldsForTemp.map{f => StructField(f, DoubleType, nullable=true)})
您将模式将列的类型设置为字符串,但是您为其赋予了可为null的double值(即java.lang.Double)。 考虑将schemaTemp的定义更改为:
val schemaTemp = StructType(headerFieldsForTemp.map{f => StructField(f, DoubleType, nullable=true)})
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.