[英]Filter rows from ogrouped data frames based on string & boolean columns
[英]Data Frames being read in with varying number of columns, how do I dynamically change data types of only columns that are Boolean to String data type?
在我的筆記本中,我有正在讀取的數據框,每次運行筆記本時都會有可變數量的列。 如何僅將 Boolean 數據類型的列的數據類型動態更改為字符串數據類型?
這是我面臨的一個問題,所以我發布答案以防萬一這對其他人有幫助。
數據框的名稱是“df”。
在這里,我們將傳入數據集中的每一列(Boolean 數據類型)動態轉換為字符串數據類型:
def bool_col_DataTypes(DataFrame):
"""This Function accepts a Spark Data Frame as an argument. It returns a list of all Boolean columns in your dataframe."""
DataFrame = dict(DataFrame.dtypes)
list_of_bool_cols_for_conversion = [x for x, y in DataFrame.items() if y == 'boolean']
return list_of_bool_cols_for_conversion
list_of_bool_columns = bool_col_DataTypes(df)
for i in list_of_bool_columns:
df = df.withColumn(i, F.col(i).cast(StringType()))
new_df = df
data=([(True, 'Lion',1),
(False, 'fridge',2),
( True, 'Bat', 23)])
schema =StructType([StructField('Answer',BooleanType(), True),StructField('Entity',StringType(), True),StructField('ID',IntegerType(), True)])
df=spark.createDataFrame(data, schema)
df.printSchema()
架構
root
|-- Answer: boolean (nullable = true)
|-- Entity: string (nullable = true)
|-- ID: integer (nullable = true)
轉型
df1 =df.select( *[col(x).cast('string').alias(x) if y =='boolean' else col(x) for x, y in df.dtypes])
df1.printSchema()
root
|-- Answer: string (nullable = true)
|-- Entity: string (nullable = true)
|-- ID: integer (nullable = true)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.