如何處理 spark 中 dataframe 列名稱中的空格

Question

我從 df 注冊了一個 tmp 表，該表在 header 列中有空格。如何在通過 sqlContext 使用 sql 查詢時提取該列。 我嘗試使用反引號，但它不起作用

df1 =  sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score as Z_Score` from tmp1 """)

Answer 1

您只需將列名放在后面的刻度中，而不是它的別名：

沒有別名 ：

df1 =  sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score` as Z_Score from tmp1""")

使用別名 ：

df1 =  sqlContext.sql("""select t1.Company, t1.Sector, t1.Industry, t1.`Altman Z-score` as Z_Score from tmp1 t1""")

Answer 2

查詢中存在問題，更正后的查詢如下（ 在``中包裝為Z_Score ）： -

df1 =  sqlContext.sql("""select Company, Sector, Industry, `Altman Z-score` as Z_Score from tmp1 """)

還有一個替代： -

import pyspark.sql.functions as F
df1 =  sqlContext.sql("""select * from tmp1 """)
df1.select(F.col("Altman Z-score").alias("Z_Score")).show()

Answer 3

https://www.tutorialspoint.com/how-to-select-a-column-name-with-spaces-in-mysql

請參考上面的鏈接，使用 ` 符號作為 Tilda ~ 的切換鍵來引用帶空格的列。 我已經嘗試了下面的代碼並且它有效

data = spark.read.options(header='True',inferschema='True',delimiter=',').csv(r'C:\Users\user\OneDrive\Desktop\diabetes.csv')
data.createOrReplaceTempView("DIABETICDATA")
spark.sql("""SELECT `Number of times pregnant` FROM DIABETICDATA WHERE `Number of times pregnant` > 10 """).show()

如何處理 spark 中 dataframe 列名稱中的空格

問題描述

3 個解決方案

解決方案1
5 已采納 2017-03-30 04:39:15

解決方案2
3 2017-03-30 04:43:43

解決方案3
1 2022-05-10 05:35:11

如何處理 spark 中 dataframe 列名稱中的空格

問題描述

3 個解決方案

解決方案1 5 已采納 2017-03-30 04:39:15

解決方案2 3 2017-03-30 04:43:43

解決方案3 1 2022-05-10 05:35:11

解決方案1
5 已采納 2017-03-30 04:39:15

解決方案2
3 2017-03-30 04:43:43

解決方案3
1 2022-05-10 05:35:11