簡體   English   中英

如何更改 StructType 或 ArrayType 列中的所有列數據類型?

[英]How to change all columns data types in StructType or ArrayType columns?

我有一個 DataFrame 包括一些帶有StructTypeArrayType的列。 我想將所有IntegerType列轉換為DoubleType 我找到了一些解決這個問題的方法。 例如,這個答案的作用與我想要的相似。 但問題是,它不會更改嵌套在StructTypeArrayType列中的列的數據類型。

例如,我有一個具有以下架構的 DataFrame:

 |-- carCategories: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- payerId: integer (nullable = true)
 |-- percentage: integer (nullable = true)
 |-- plateNumberStatus: string (nullable = true)
 |-- ratio: struct (nullable = true)
 |    |-- max: integer (nullable = true)
 |    |-- min: integer (nullable = true)

執行以下腳本后:

val doubleSchema = df.schema.fields.map{f =>
  f match{
    case StructField(name:String, _:IntegerType, _, _) => col(name).cast(DoubleType)
    case _ => col(f.name)
  }
}

df.select(doubleSchema:_*).printSchema

結果是這樣的:

 |-- carCategories: array (nullable = true)
 |    |-- element: integer (containsNull = true)
 |-- payerId: double (nullable = true)
 |-- percentage: double (nullable = true)
 |-- plateNumberStatus: string (nullable = true)
 |-- ratio: struct (nullable = true)
 |    |-- max: integer (nullable = true)
 |    |-- min: integer (nullable = true)

如您所見,某些列已轉換為DoubleType ,但ArrayTypeStructType中的列未轉換。

我希望最終架構是這樣的:

|-- carCategories: array (nullable = true)
 |    |-- element: double (containsNull = true)
 |-- payerId: double (nullable = true)
 |-- percentage: double (nullable = true)
 |-- plateNumberStatus: string (nullable = true)
 |-- ratio: struct (nullable = true)
 |    |-- max: double (nullable = true)
 |    |-- min: double (nullable = true)

我怎樣才能做到這一點?

先感謝您

您可以添加 case 子句來處理ArrayTypeStructType ,如下所示:

def castIntToDouble(schema: StructType): Seq[Column] = {
  schema.fields.map { f =>
    f.dataType match {
      case IntegerType => col(f.name).cast(DoubleType)
      case StructType(_) =>
        col(f.name).cast(
          f.dataType.simpleString.replace(s":${IntegerType.simpleString}", s":${DoubleType.simpleString}")
        )
      case dt: ArrayType =>
        dt.elementType match {
          case IntegerType => col(f.name).cast(ArrayType(DoubleType))
          case StructType(_) =>
            col(f.name).cast(
              f.dataType.simpleString.replace(s":${IntegerType.simpleString}",s":${DoubleType.simpleString}")
            )
          case _ => col(f.name)
        }
      case _ => col(f.name)
    }
  }
}

當列類型為StructType或嵌套結構數組時,function 使用DLL字符串格式進行強制轉換。 例如,如果您必須強制轉換類型為struct<max:int,min:int>的結構列ratio ,而不必重新創建您要做的整個結構:

df.withColumn("ratio", col("ratio").cast("struct<max:double,min:double>"))

現在將其應用於您的輸入示例:

val df = (
   Seq((Seq(1, 2, 3), 34, 87, "pending", (65, 22)))
  .toDF("carCategories","payerId","percentage","plateNumberStatus","ratio")
  .withColumn("ratio", col("ratio").cast("struct<max:int,min:int>"))
)

df.select(castIntToDouble(df.schema):_*).printSchema
//root
// |-- carCategories: array (nullable = true)
// |    |-- element: double (containsNull = true)
// |-- payerId: double (nullable = false)
// |-- percentage: double (nullable = false)
// |-- plateNumberStatus: string (nullable = true)
// |-- ratio: struct (nullable = true)
// |    |-- max: double (nullable = true)
// |    |-- min: double (nullable = true)

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM