使用 Scala 将某个 DataType 的所有列的 DataType 转换为 Spark DataFrame 中的另一个 DataType

Question

I have a Spark DataFrame with more than 100 columns.我有一个超过 100 列的 Spark DataFrame。 In this DataFrame, I would like to convert all the DoubleType columns to DecimalType(18,5) .在这个 DataFrame 中，我想将所有DoubleType列转换为DecimalType(18,5) 。 I able to convert one specific datatype to another using below way:我可以使用以下方式将一种特定的数据类型转换为另一种：

def castAllTypedColumnsTo(inputDF: DataFrame, sourceType: DataType) = {

    val targetType = sourceType match {
      case DoubleType => DecimalType(18,5)
      case _ => sourceType
    }

    inputDF.schema.filter(_.dataType == sourceType).foldLeft(inputDF) {
      case (acc, col) => acc.withColumn(col.name, inputDF(col.name).cast(targetType))
    }
  }

val inputDF = Seq((1,1.0),(2,2.0)).toDF("id","amount")

inputDF.printSchema()

root
 |-- id: integer (nullable = true)
 |-- amount: double (nullable = true)

val finalDF : DataFrame = castAllTypedColumnsTo(inputDF, DoubleType)

finalDF.printSchema()

root
 |-- id: integer (nullable = true)
 |-- amount: decimal(18,5) (nullable = true)

Here I'm filtering out the DoubleType columns and converting to DecimalType(18,5) .在这里，我过滤掉DoubleType列并转换为DecimalType(18,5) 。 Let's say if I want to convert another DataType, how can I implement that scenario without passing the datatype as an input parameter.假设我想转换另一个 DataType，如何在不将数据类型作为输入参数传递的情况下实现该场景。

I was expecting something like below:我期待像下面这样的东西：

def convertDataType(inputDF: DataFrame): DataFrame = {

   inputDF.dtypes.map{
       case (colName, colType) => (colName, colType match {
          case "DoubleType" => DecimalType(18,5).toString
          case _ => colType
          })
   }
   //finalDF to be created with new DataType.
}

val finalDF = convertDataType(inputDF)

Can someone help me to handle this scenario?有人可以帮我处理这种情况吗？

Answer 1

Try below code.试试下面的代码。

scala> :paste
// Entering paste mode (ctrl-D to finish)

import org.apache.spark.sql.types.StructField

def castAllTypedColumnsTo(field: StructField) = field.dataType.typeName match {
      case "double" => col(field.name).cast("decimal(18,5)")
      case "integer" => col(field.name).cast("integer")
      case _ => col(field.name)
}

inputDF
.select(inputDF.schema.map(castAllTypedColumnsTo):_*)
.show(false)

// Exiting paste mode, now interpreting.

+---+-------+
|id |amount |
+---+-------+
|1  |1.00000|
|2  |2.00000|
+---+-------+

import org.apache.spark.sql.types.StructField
castAllTypedColumnsTo: (field: org.apache.spark.sql.types.StructField)org.apache.spark.sql.Column

scala>

使用 Scala 将某个 DataType 的所有列的 DataType 转换为 Spark DataFrame 中的另一个 DataType

问题描述

1 个解决方案

解决方案1
3 2020-12-16 04:18:31

使用 Scala 将某个 DataType 的所有列的 DataType 转换为 Spark DataFrame 中的另一个 DataType

问题描述

1 个解决方案

解决方案1 3 2020-12-16 04:18:31

解决方案1
3 2020-12-16 04:18:31