Using alias to rename pyspark columns

Question

I'm trying to import a parquet file in Databricks (pyspark) and keep getting the error

df = spark.read.parquet(inputFilePath)

AnalysisException:  Column name "('my data (beta)', "Meas'd Qty")" contains invalid character(s). Please use alias to rename it.

I tried the suggestions in this post , using .withColumnRenamed like in this post , and also using alias like

(spark.read.parquet(inputFilePath)).select(col("('my data (beta)', "Meas'd Qty")").alias("col")).show()

but always get the same error. How do I go through each column to replace any invalid characters with underscore _ or even just delete all invalid characters?

Answer 1

How is the old file generated? The file was saved with column names that are not allowed by the spark.

Better to fix this issue at the source when this file is generated.

Few approaches you can try in spark to resolve are

In the select statement put column name under ```. Like

(spark.read.parquet(inputFilePath)).select(col(`('my data (beta)', "Meas'd Qty")`).alias("col")).show()

Try to rename using toDF

(spark.read.parquet(inputFilePath)).toDF(["col_a", "col_b", ...]).show()

Using alias to rename pyspark columns

Question

1 answers

solution1
0 2021-12-15 23:43:44

Using alias to rename pyspark columns

Question

1 answers

solution1 0 2021-12-15 23:43:44

solution1
0 2021-12-15 23:43:44