简体   繁体   中英

Scala/Spark: How to print content of a dataset[row] when row consists of fields of type double

I have a model class in Scala like :

package examples.partnerModels
import com.fasterxml.jackson.annotation.JsonProperty
case class  Temparature (@JsonProperty YEAR: Double,
                         @JsonProperty MONTH: Double,
                         @JsonProperty DAY : Double,
                         @JsonProperty MAX_TEMP: Double,
                         @JsonProperty MIN_TEMP : Double
                        )
{

  def this() = this(0,0,0,0,0)

  def getDataFields(): List[Double] =
  {
    productIterator.asInstanceOf[Iterator[Double]].toList
  }
}

object Temparature {
  def apply() = new Temparature(0,0,0,0,0)
}

I have created a dataframe with this model of temptarature and sorted records and trying to print the content of each record in that dataframe this way:

val dataRecordsTemp = sc.textFile(tempFile).map{rec=>
            val splittedRec = rec.split("\\s+")
            Temparature(
              if(isEmpty(splittedRec(0))) 0 else splittedRec(0).toDouble,
              if(isEmpty(splittedRec(1))) 0 else splittedRec(1).toDouble,
              if(isEmpty(splittedRec(2))) 0 else splittedRec(2).toDouble,
              if(isEmpty(splittedRec(3))) 0 else splittedRec(3).toDouble,
              if(isEmpty(splittedRec(4))) 0 else splittedRec(4).toDouble
            )
        }.map{x => Row.fromSeq(x.getDataFields())}

val headerFieldsForTemp = Seq("YEAR","MONTH","DAY","MAX_TEMP","MIN_TEMP")
val schemaTemp = StructType(headerFieldsForTemp.map{f => StructField(f, StringType, nullable=true)})
val dfTemp = session.createDataFrame(dataRecordsTemp,schemaTemp)
              .orderBy(desc("year"), desc("month"), desc("day"))

println("Printing temparature data ...............................")
dfTemp.show(20)

However, I am getting an error on the line where I am trying to print:

java.lang.Double is not a valid external type for schema of string

How can I print content of a dataframe which has row of fields of type Double ?

Instead of splittedRec(i).toDouble, use java.lang.Double.parseDouble(splittedRec(i))

To print content of dataframe with row of fields of type double, your structfields should be of DoubleType

val schemaTemp = StructType(headerFieldsForTemp.map{f => StructField(f, DoubleType, nullable=true)})

You schema sets the type of the columns to string but you are giving it values of nullable double (ie java.lang.Double). Consider changing the definition of schemaTemp to:

val schemaTemp = StructType(headerFieldsForTemp.map{f => StructField(f, DoubleType, nullable=true)})

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM