简体   繁体   English

PySpark 如何遍历 Dataframe 列并更改数据类型?

[英]PySpark how to iterate over Dataframe columns and change data type?

What is the best way to iterate over Spark Dataframe (using Pyspark) and once find data type of Decimal(38,10) -> change it to Bigint (and resave all to the same dataframe)?迭代 Spark 数据帧(使用 Pyspark)并一旦找到Decimal(38,10)数据类型 -> 将其更改为 Bigint(并将所有数据重新保存到同一数据帧)的最佳方法是什么?

I have a part for changing data types - eg:我有一部分用于更改数据类型 - 例如:

df = df.withColumn("COLUMN_X", df["COLUMN_X"].cast(IntegerType()))

but trying to find and integrate with iteration..但试图找到并与迭代集成..

Thanks.谢谢。

You can loop through df.dtypes and cast to bigint when type is equal to decimal(38,10) :您可以遍历df.dtypes并在 type 等于decimal(38,10)时转换为bigint

from pyspark.sql.funtions import col

select_expr = [
    col(c).cast("bigint") if t == "decimal(38,10)" else col(c) for c, t in df.dtypes
]

df = df.select(*select_expr)

I found this post https://stackoverflow.com/a/54399474/11268096 , where you can loop trough all columns and cast them to your desired data type.我找到了这篇文章https://stackoverflow.com/a/54399474/11268096 ,您可以在其中循环遍历所有列并将它们转换为所需的数据类型。

from pyspark.sql import functions as F

for col in df.columns:
  df = df.withColumn(
    col,
    F.col(col).cast("double")
  )

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM