以不同數量的列讀入的數據幀，如何將僅 Boolean 的列的數據類型動態更改為字符串數據類型？

Question

在我的筆記本中，我有正在讀取的數據框，每次運行筆記本時都會有可變數量的列。 如何僅將 Boolean 數據類型的列的數據類型動態更改為字符串數據類型？

這是我面臨的一個問題，所以我發布答案以防萬一這對其他人有幫助。

數據框的名稱是“df”。

在這里，我們將傳入數據集中的每一列（Boolean 數據類型）動態轉換為字符串數據類型：

def bool_col_DataTypes(DataFrame):

    """This Function accepts a Spark Data Frame as an argument. It returns a list of all Boolean columns in your dataframe."""
    
    DataFrame = dict(DataFrame.dtypes)
    list_of_bool_cols_for_conversion = [x for x, y in DataFrame.items() if y == 'boolean']
    return list_of_bool_cols_for_conversion


list_of_bool_columns = bool_col_DataTypes(df)
    
for i in list_of_bool_columns:
    df = df.withColumn(i, F.col(i).cast(StringType()))
    
new_df = df

Answer 1

data=([(True, 'Lion',1),
       (False, 'fridge',2),
     ( True, 'Bat', 23)])

schema =StructType([StructField('Answer',BooleanType(), True),StructField('Entity',StringType(), True),StructField('ID',IntegerType(), True)])

df=spark.createDataFrame(data, schema)
df.printSchema()

架構

root
 |-- Answer: boolean (nullable = true)
 |-- Entity: string (nullable = true)
 |-- ID: integer (nullable = true)

轉型

df1 =df.select( *[col(x).cast('string').alias(x) if y =='boolean' else col(x) for x, y in df.dtypes])

df1.printSchema()

root
 |-- Answer: string (nullable = true)
 |-- Entity: string (nullable = true)
 |-- ID: integer (nullable = true)

以不同數量的列讀入的數據幀，如何將僅 Boolean 的列的數據類型動態更改為字符串數據類型？

問題描述

1 個解決方案

解決方案1
1 已采納 2022-09-10 00:05:42

以不同數量的列讀入的數據幀，如何將僅 Boolean 的列的數據類型動態更改為字符串數據類型？

問題描述

1 個解決方案

解決方案1 1 已采納 2022-09-10 00:05:42

解決方案1
1 已采納 2022-09-10 00:05:42