[英]Convert DataType of all columns of certain DataType to another DataType in Spark DataFrame using Scala
I have a Spark DataFrame with more than 100 columns.我有一个超过 100 列的 Spark DataFrame。 In this DataFrame, I would like to convert all the
DoubleType
columns to DecimalType(18,5)
.在这个 DataFrame 中,我想将所有
DoubleType
列转换为DecimalType(18,5)
。 I able to convert one specific datatype to another using below way:我可以使用以下方式将一种特定的数据类型转换为另一种:
def castAllTypedColumnsTo(inputDF: DataFrame, sourceType: DataType) = {
val targetType = sourceType match {
case DoubleType => DecimalType(18,5)
case _ => sourceType
}
inputDF.schema.filter(_.dataType == sourceType).foldLeft(inputDF) {
case (acc, col) => acc.withColumn(col.name, inputDF(col.name).cast(targetType))
}
}
val inputDF = Seq((1,1.0),(2,2.0)).toDF("id","amount")
inputDF.printSchema()
root
|-- id: integer (nullable = true)
|-- amount: double (nullable = true)
val finalDF : DataFrame = castAllTypedColumnsTo(inputDF, DoubleType)
finalDF.printSchema()
root
|-- id: integer (nullable = true)
|-- amount: decimal(18,5) (nullable = true)
Here I'm filtering out the DoubleType
columns and converting to DecimalType(18,5)
.在这里,我过滤掉
DoubleType
列并转换为DecimalType(18,5)
。 Let's say if I want to convert another DataType, how can I implement that scenario without passing the datatype as an input parameter.假设我想转换另一个 DataType,如何在不将数据类型作为输入参数传递的情况下实现该场景。
I was expecting something like below:我期待像下面这样的东西:
def convertDataType(inputDF: DataFrame): DataFrame = {
inputDF.dtypes.map{
case (colName, colType) => (colName, colType match {
case "DoubleType" => DecimalType(18,5).toString
case _ => colType
})
}
//finalDF to be created with new DataType.
}
val finalDF = convertDataType(inputDF)
Can someone help me to handle this scenario?有人可以帮我处理这种情况吗?
Try below code.试试下面的代码。
scala> :paste
// Entering paste mode (ctrl-D to finish)
import org.apache.spark.sql.types.StructField
def castAllTypedColumnsTo(field: StructField) = field.dataType.typeName match {
case "double" => col(field.name).cast("decimal(18,5)")
case "integer" => col(field.name).cast("integer")
case _ => col(field.name)
}
inputDF
.select(inputDF.schema.map(castAllTypedColumnsTo):_*)
.show(false)
// Exiting paste mode, now interpreting.
+---+-------+
|id |amount |
+---+-------+
|1 |1.00000|
|2 |2.00000|
+---+-------+
import org.apache.spark.sql.types.StructField
castAllTypedColumnsTo: (field: org.apache.spark.sql.types.StructField)org.apache.spark.sql.Column
scala>
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.