I'm trying to import a parquet file in Databricks (pyspark) and keep getting the error
df = spark.read.parquet(inputFilePath)
AnalysisException: Column name "('my data (beta)', "Meas'd Qty")" contains invalid character(s). Please use alias to rename it.
I tried the suggestions in this post , using .withColumnRenamed
like in this post , and also using alias
like
(spark.read.parquet(inputFilePath)).select(col("('my data (beta)', "Meas'd Qty")").alias("col")).show()
but always get the same error. How do I go through each column to replace any invalid characters with underscore _
or even just delete all invalid characters?
How is the old file generated? The file was saved with column names that are not allowed by the spark.
Better to fix this issue at the source when this file is generated.
Few approaches you can try in spark to resolve are
(spark.read.parquet(inputFilePath)).select(col(`('my data (beta)', "Meas'd Qty")`).alias("col")).show()
toDF
(spark.read.parquet(inputFilePath)).toDF(["col_a", "col_b", ...]).show()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.