使用 pyspark 將 StructType、ArrayType 轉換/轉換為 StringType（單值）

Question

我的 Dataframe(spark.sql) 之一具有此架構。

root
 |-- ValueA: string (nullable = true)
 |-- ValueB: struct (nullable = true)
 |    |-- abc: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- a0: string (nullable = true)
 |    |    |    |-- a1: string (nullable = true)
 |    |    |    |-- a2: string (nullable = true)
 |    |    |    |-- a3: string (nullable = true)
 |-- ValueC: struct (nullable = true)
 |    |-- pqr: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- info1: string (nullable = true)
 |    |    |    |-- info2: struct (nullable = true)
 |    |    |    |    |-- x1: long (nullable = true)
 |    |    |    |    |-- x2: long (nullable = true)
 |    |    |    |    |-- x3: string (nullable = true)
 |    |    |    |-- info3: string (nullable = true)
 |    |    |    |-- info4: string (nullable = true)
 |-- Value4: struct (nullable = true)
 |    |-- xyz: array (nullable = true)
 |    |    |-- element: struct (containsNull = true)
 |    |    |    |-- b0: string (nullable = true)
 |    |    |    |-- b2: string (nullable = true)
 |    |    |    |-- b3: string (nullable = true)
 |-- Value5: string (nullable = true)

我需要將其保存到 CSV 文件但不使用任何展平，以以下格式展開。

 |-- ValueA: string (nullable = true)
 |-- ValueB: struct (nullable = true)
 |-- ValueC: struct (nullable = true)
 |-- ValueD: struct (nullable = true)
 |-- ValueE: string (nullable = true)

我直接使用了命令[df.to_pandas().to_csv("output.csv")]這符合我的目的，但我需要更好的方法。 我正在使用 pyspark

Answer 1

在 Spark 中編寫csv格式還不支持編寫struct/array..etc復雜類型。

Write as Parquet file:

Spark 中更好的方法是寫入parquet格式，因為 parquet 格式支持所有nested data types ，並在讀/寫時提供更好的性能。

df.write.parquet("<path>")

Write as Json file:

如果以 json 格式寫入，則接受

df.write.json("path")
#or
df.toJSON().saveAsTextFile("path")

Write as CSV file:

使用to_json function 將 json struct/Array轉換為string並存儲為 csv 格式。

df.selectExpr("valueA","to_json(ValueB)"..etc).write.csv("path")

使用 pyspark 將 StructType、ArrayType 轉換/轉換為 StringType（單值）

問題描述

1 個解決方案

解決方案1
1 2020-07-09 20:07:38

使用 pyspark 將 StructType、ArrayType 轉換/轉換為 StringType（單值）

問題描述

1 個解決方案

解決方案1 1 2020-07-09 20:07:38

解決方案1
1 2020-07-09 20:07:38