PySpark 如何遍历 Dataframe 列并更改数据类型？

Question

What is the best way to iterate over Spark Dataframe (using Pyspark) and once find data type of Decimal(38,10) -> change it to Bigint (and resave all to the same dataframe)?迭代 Spark 数据帧（使用 Pyspark）并一旦找到Decimal(38,10)数据类型 -> 将其更改为 Bigint（并将所有数据重新保存到同一数据帧）的最佳方法是什么？

I have a part for changing data types - eg:我有一部分用于更改数据类型 - 例如：

df = df.withColumn("COLUMN_X", df["COLUMN_X"].cast(IntegerType()))

but trying to find and integrate with iteration..但试图找到并与迭代集成..

Thanks.谢谢。

Answer 1

You can loop through df.dtypes and cast to bigint when type is equal to decimal(38,10) :您可以遍历df.dtypes并在 type 等于decimal(38,10)时转换为bigint ：

from pyspark.sql.funtions import col

select_expr = [
    col(c).cast("bigint") if t == "decimal(38,10)" else col(c) for c, t in df.dtypes
]

df = df.select(*select_expr)

Answer 2

I found this post https://stackoverflow.com/a/54399474/11268096 , where you can loop trough all columns and cast them to your desired data type.我找到了这篇文章https://stackoverflow.com/a/54399474/11268096 ，您可以在其中循环遍历所有列并将它们转换为所需的数据类型。

from pyspark.sql import functions as F

for col in df.columns:
  df = df.withColumn(
    col,
    F.col(col).cast("double")
  )

PySpark 如何遍历 Dataframe 列并更改数据类型？

问题描述

2 个解决方案

解决方案1
2 已采纳 2020-03-04 21:26:18

解决方案2
0 2020-03-24 10:30:23

PySpark 如何遍历 Dataframe 列并更改数据类型？

问题描述

2 个解决方案

解决方案1 2 已采纳 2020-03-04 21:26:18

解决方案2 0 2020-03-24 10:30:23

解决方案1
2 已采纳 2020-03-04 21:26:18

解决方案2
0 2020-03-24 10:30:23