[英]PySpark how to iterate over Dataframe columns and change data type?
What is the best way to iterate over Spark Dataframe (using Pyspark) and once find data type of Decimal(38,10)
-> change it to Bigint (and resave all to the same dataframe)?迭代 Spark 数据帧(使用 Pyspark)并一旦找到Decimal(38,10)
数据类型 -> 将其更改为 Bigint(并将所有数据重新保存到同一数据帧)的最佳方法是什么?
I have a part for changing data types - eg:我有一部分用于更改数据类型 - 例如:
df = df.withColumn("COLUMN_X", df["COLUMN_X"].cast(IntegerType()))
but trying to find and integrate with iteration..但试图找到并与迭代集成..
Thanks.谢谢。
You can loop through df.dtypes
and cast to bigint
when type is equal to decimal(38,10)
:您可以遍历df.dtypes
并在 type 等于decimal(38,10)
时转换为bigint
:
from pyspark.sql.funtions import col
select_expr = [
col(c).cast("bigint") if t == "decimal(38,10)" else col(c) for c, t in df.dtypes
]
df = df.select(*select_expr)
I found this post https://stackoverflow.com/a/54399474/11268096 , where you can loop trough all columns and cast them to your desired data type.我找到了这篇文章https://stackoverflow.com/a/54399474/11268096 ,您可以在其中循环遍历所有列并将它们转换为所需的数据类型。
from pyspark.sql import functions as F
for col in df.columns:
df = df.withColumn(
col,
F.col(col).cast("double")
)
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.