简体   繁体   中英

How to handle white spaces in dataframe column names in spark

I registered a tmp table from a df that has white spaces in the column header.how can i extract the column while using sql query via sqlContext. I tried to use back-tick but it is not working

df1 =  sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score as Z_Score` from tmp1 """)

You have to place only the column name within back-ticks, not its alias:

Without Alias :

df1 =  sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score` as Z_Score from tmp1""")

With Alias :

df1 =  sqlContext.sql("""select t1.Company, t1.Sector, t1.Industry, t1.`Altman Z-score` as Z_Score from tmp1 t1""")

There is problem in query, Corrected query is below ( wrapped as Z_Score in `` ) :-

df1 =  sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score` as Z_Score from tmp1 """)

One more Alternate:-

import pyspark.sql.functions as F
df1 =  sqlContext.sql("""select * from tmp1 """)
df1.select(F.col("Altman Z-score").alias("Z_Score")).show()

https://www.tutorialspoint.com/how-to-select-a-column-name-with-spaces-in-mysql

Please refer the above link to use the ` symbol a toggle key for Tilda ~ to refer a column with spaces. I have tried the below code and it has worked

data = spark.read.options(header='True',inferschema='True',delimiter=',').csv(r'C:\Users\user\OneDrive\Desktop\diabetes.csv')
data.createOrReplaceTempView("DIABETICDATA")
spark.sql("""SELECT `Number of times pregnant` FROM DIABETICDATA WHERE `Number of times pregnant` > 10 """).show()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM