[英]Hive queries in spark sql
我可以在 spark.sql 中使用任何類型的 sql 查詢,但是在下面的查詢中應用 spark.sql 時會出現錯誤。 (cstone_feed_key|cstone_last_updatetm|rte_call_key_seq_no)?+.+
sysntax 主要用於從 hive 中的選擇中排除字段。 請提出一些方法來做同樣的事情。 我在表中有大約 1000 個字段。
select rte_call_key_seq_no as T_rte_call_key_seq_no, (cstone_feed_key|cstone_last_updatetm|rte_call_key_seq_no)?+.+
from table
update-1
spark.sql("SET spark.sql.parser.quotedRegexColumnNames=true")
df.createOrReplaceTempView("table")
spark.sql("select `(account_id|credit_card_limit)?+.+` from table")
.printSchema()
/**
* root
* |-- credit_card_Number: long (nullable = true)
* |-- first_name: string (nullable = true)
* |-- last_name: string (nullable = true)
* |-- phone_number: integer (nullable = true)
* |-- amount: integer (nullable = true)
* |-- date: string (nullable = true)
* |-- shop: string (nullable = true)
* |-- transaction_code: string (nullable = true)
*/
original answer
替代-
SELECT `(ds|hr)?+.+` FROM sales
python
df = spark.createDataFrame([("a", 1), ("b", 2), ("c", 3)], ["Col1", "Col2"])
df.select(df.colRegex("`(Col1)?+.+`")).show()
+----+
|Col2|
+----+
| 1|
| 2|
| 3|
+----+
scala
df.select(df.colRegex("`(account_id|credit_card_limit)?+.+`"))
.printSchema()
/**
* root
* |-- credit_card_Number: long (nullable = true)
* |-- first_name: string (nullable = true)
* |-- last_name: string (nullable = true)
* |-- phone_number: integer (nullable = true)
* |-- amount: integer (nullable = true)
* |-- date: string (nullable = true)
* |-- shop: string (nullable = true)
* |-- transaction_code: string (nullable = true)
*/
另一種方法
df.printSchema()
/**
* root
* |-- account_id: integer (nullable = true)
* |-- credit_card_Number: long (nullable = true)
* |-- credit_card_limit: integer (nullable = true)
* |-- first_name: string (nullable = true)
* |-- last_name: string (nullable = true)
* |-- phone_number: integer (nullable = true)
* |-- amount: integer (nullable = true)
* |-- date: string (nullable = true)
* |-- shop: string (nullable = true)
* |-- transaction_code: string (nullable = true)
*/
// hive syntax
// The following query selects all columns except ds and hr.
// SELECT `(ds|hr)?+.+` FROM sales
// Java regex syntax
df.selectExpr(df.columns.filter(_.matches("(account_id|credit_card_limit)?+.+")): _*)
.printSchema()
/**
* root
* |-- credit_card_Number: long (nullable = true)
* |-- first_name: string (nullable = true)
* |-- last_name: string (nullable = true)
* |-- phone_number: integer (nullable = true)
* |-- amount: integer (nullable = true)
* |-- date: string (nullable = true)
* |-- shop: string (nullable = true)
* |-- transaction_code: string (nullable = true)
*/
ref - hive 語言手冊
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.