繁体   English   中英

如何更改 StructType 或 ArrayType 列中的所有列数据类型?

[英]How to change all columns data types in StructType or ArrayType columns?

我有一个 DataFrame 包括一些带有StructTypeArrayType的列。 我想将所有IntegerType列转换为DoubleType 我找到了一些解决这个问题的方法。 例如,这个答案的作用与我想要的相似。 但问题是,它不会更改嵌套在StructTypeArrayType列中的列的数据类型。

例如,我有一个具有以下架构的 DataFrame:

 |-- carCategories: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- payerId: integer (nullable = true)
 |-- percentage: integer (nullable = true)
 |-- plateNumberStatus: string (nullable = true)
 |-- ratio: struct (nullable = true)
 |    |-- max: integer (nullable = true)
 |    |-- min: integer (nullable = true)

执行以下脚本后:

val doubleSchema = df.schema.fields.map{f =>
  f match{
    case StructField(name:String, _:IntegerType, _, _) => col(name).cast(DoubleType)
    case _ => col(f.name)
  }
}

df.select(doubleSchema:_*).printSchema

结果是这样的:

 |-- carCategories: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- payerId: double (nullable = true)
 |-- percentage: double (nullable = true)
 |-- plateNumberStatus: string (nullable = true)
 |-- ratio: struct (nullable = true)
 |    |-- max: integer (nullable = true)
 |    |-- min: integer (nullable = true)

如您所见,某些列已转换为DoubleType ,但ArrayTypeStructType中的列未转换。

我希望最终架构是这样的:

|-- carCategories: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- payerId: double (nullable = true)
 |-- percentage: double (nullable = true)
 |-- plateNumberStatus: string (nullable = true)
 |-- ratio: struct (nullable = true)
 |    |-- max: double (nullable = true)
 |    |-- min: double (nullable = true)

我怎样才能做到这一点?

先感谢您

您可以添加 case 子句来处理ArrayTypeStructType ,如下所示:

def castIntToDouble(schema: StructType): Seq[Column] = {
  schema.fields.map { f =>
    f.dataType match {
      case IntegerType => col(f.name).cast(DoubleType)
      case StructType(_) =>
        col(f.name).cast(
          f.dataType.simpleString.replace(s":${IntegerType.simpleString}", s":${DoubleType.simpleString}")
        )
      case dt: ArrayType =>
        dt.elementType match {
          case IntegerType => col(f.name).cast(ArrayType(DoubleType))
          case StructType(_) =>
            col(f.name).cast(
              f.dataType.simpleString.replace(s":${IntegerType.simpleString}",s":${DoubleType.simpleString}")
            )
          case _ => col(f.name)
        }
      case _ => col(f.name)
    }
  }
}

当列类型为StructType或嵌套结构数组时,function 使用DLL字符串格式进行强制转换。 例如,如果您必须强制转换类型为struct<max:int,min:int>的结构列ratio ,而不必重新创建您要做的整个结构:

df.withColumn("ratio", col("ratio").cast("struct<max:double,min:double>"))

现在将其应用于您的输入示例:

val df = (
   Seq((Seq(1, 2, 3), 34, 87, "pending", (65, 22)))
  .toDF("carCategories","payerId","percentage","plateNumberStatus","ratio")
  .withColumn("ratio", col("ratio").cast("struct<max:int,min:int>"))
)

df.select(castIntToDouble(df.schema):_*).printSchema
//root
// |-- carCategories: array (nullable = true)
// |    |-- element: double (containsNull = true)
// |-- payerId: double (nullable = false)
// |-- percentage: double (nullable = false)
// |-- plateNumberStatus: string (nullable = true)
// |-- ratio: struct (nullable = true)
// |    |-- max: double (nullable = true)
// |    |-- min: double (nullable = true)

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM