简体   繁体   中英

Using alias to rename pyspark columns

I'm trying to import a parquet file in Databricks (pyspark) and keep getting the error

df = spark.read.parquet(inputFilePath)

AnalysisException:  Column name "('my data (beta)', "Meas'd Qty")" contains invalid character(s). Please use alias to rename it. 

I tried the suggestions in this post , using .withColumnRenamed like in this post , and also using alias like

(spark.read.parquet(inputFilePath)).select(col("('my data (beta)', "Meas'd Qty")").alias("col")).show()

but always get the same error. How do I go through each column to replace any invalid characters with underscore _ or even just delete all invalid characters?

How is the old file generated? The file was saved with column names that are not allowed by the spark.

Better to fix this issue at the source when this file is generated.

Few approaches you can try in spark to resolve are

  1. In the select statement put column name under ```. Like
(spark.read.parquet(inputFilePath)).select(col(`('my data (beta)', "Meas'd Qty")`).alias("col")).show()
  1. Try to rename using toDF
(spark.read.parquet(inputFilePath)).toDF(["col_a", "col_b", ...]).show()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM