简体   繁体   English

以不同数量的列读入的数据帧,如何将仅 Boolean 的列的数据类型动态更改为字符串数据类型?

[英]Data Frames being read in with varying number of columns, how do I dynamically change data types of only columns that are Boolean to String data type?

In my notebook, I have Data Frames being read in that will have a variable number of columns every time the notebook is ran.在我的笔记本中,我有正在读取的数据框,每次运行笔记本时都会有可变数量的列。 How do I dynamically change the data types of only the columns that are Boolean data types to String data type?如何仅将 Boolean 数据类型的列的数据类型动态更改为字符串数据类型?

This is a problem I faced so I am posting the answer incase this helps someone else.这是我面临的一个问题,所以我发布答案以防万一这对其他人有帮助。

The name of the data frame is "df".数据框的名称是“df”。

Here we dynamically convert every column in the incoming dataset that is a Boolean data type to a String data type:在这里,我们将传入数据集中的每一列(Boolean 数据类型)动态转换为字符串数据类型:

def bool_col_DataTypes(DataFrame):

    """This Function accepts a Spark Data Frame as an argument. It returns a list of all Boolean columns in your dataframe."""
    
    DataFrame = dict(DataFrame.dtypes)
    list_of_bool_cols_for_conversion = [x for x, y in DataFrame.items() if y == 'boolean']
    return list_of_bool_cols_for_conversion


list_of_bool_columns = bool_col_DataTypes(df)
    
for i in list_of_bool_columns:
    df = df.withColumn(i, F.col(i).cast(StringType()))
    
new_df = df
data=([(True, 'Lion',1),
       (False, 'fridge',2),
     ( True, 'Bat', 23)])

schema =StructType([StructField('Answer',BooleanType(), True),StructField('Entity',StringType(), True),StructField('ID',IntegerType(), True)])

df=spark.createDataFrame(data, schema)
df.printSchema()

Schema架构

root
 |-- Answer: boolean (nullable = true)
 |-- Entity: string (nullable = true)
 |-- ID: integer (nullable = true)

Transformation转型

df1 =df.select( *[col(x).cast('string').alias(x) if y =='boolean' else col(x) for x, y in df.dtypes])

df1.printSchema() df1.printSchema()

root
 |-- Answer: string (nullable = true)
 |-- Entity: string (nullable = true)
 |-- ID: integer (nullable = true)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM