簡體   English   中英

Spark sql 中的 Hive 查詢

[英]Hive queries in spark sql

我可以在 spark.sql 中使用任何類型的 sql 查詢,但是在下面的查詢中應用 spark.sql 時會出現錯誤。 (cstone_feed_key|cstone_last_updatetm|rte_call_key_seq_no)?+.+ sysntax 主要用於從 hive 中的選擇中排除字段。 請提出一些方法來做同樣的事情。 我在表中有大約 1000 個字段。

select rte_call_key_seq_no as T_rte_call_key_seq_no, (cstone_feed_key|cstone_last_updatetm|rte_call_key_seq_no)?+.+ from table

update-1

spark.sql("SET spark.sql.parser.quotedRegexColumnNames=true")
    df.createOrReplaceTempView("table")
    spark.sql("select `(account_id|credit_card_limit)?+.+` from table")
      .printSchema()

    /**
      * root
      * |-- credit_card_Number: long (nullable = true)
      * |-- first_name: string (nullable = true)
      * |-- last_name: string (nullable = true)
      * |-- phone_number: integer (nullable = true)
      * |-- amount: integer (nullable = true)
      * |-- date: string (nullable = true)
      * |-- shop: string (nullable = true)
      * |-- transaction_code: string (nullable = true)
      */

original answer替代-

hive 語法

  1. 以下查詢選擇除 ds 和 hr 之外的所有列。
  2. Java 正則表達式語法
SELECT `(ds|hr)?+.+` FROM sales

火花語法

python

df = spark.createDataFrame([("a", 1), ("b", 2), ("c",  3)], ["Col1", "Col2"])
df.select(df.colRegex("`(Col1)?+.+`")).show()
+----+
|Col2|
+----+
|   1|
|   2|
|   3|
+----+

scala

df.select(df.colRegex("`(account_id|credit_card_limit)?+.+`"))
      .printSchema()

    /**
      * root
      * |-- credit_card_Number: long (nullable = true)
      * |-- first_name: string (nullable = true)
      * |-- last_name: string (nullable = true)
      * |-- phone_number: integer (nullable = true)
      * |-- amount: integer (nullable = true)
      * |-- date: string (nullable = true)
      * |-- shop: string (nullable = true)
      * |-- transaction_code: string (nullable = true)
      */

另一種方法

df.printSchema()

    /**
      * root
      * |-- account_id: integer (nullable = true)
      * |-- credit_card_Number: long (nullable = true)
      * |-- credit_card_limit: integer (nullable = true)
      * |-- first_name: string (nullable = true)
      * |-- last_name: string (nullable = true)
      * |-- phone_number: integer (nullable = true)
      * |-- amount: integer (nullable = true)
      * |-- date: string (nullable = true)
      * |-- shop: string (nullable = true)
      * |-- transaction_code: string (nullable = true)
      */

    // hive syntax
    // The following query selects all columns except ds and hr.
    // SELECT `(ds|hr)?+.+` FROM sales
    // Java regex syntax

    df.selectExpr(df.columns.filter(_.matches("(account_id|credit_card_limit)?+.+")): _*)
      .printSchema()

    /**
      * root
      * |-- credit_card_Number: long (nullable = true)
      * |-- first_name: string (nullable = true)
      * |-- last_name: string (nullable = true)
      * |-- phone_number: integer (nullable = true)
      * |-- amount: integer (nullable = true)
      * |-- date: string (nullable = true)
      * |-- shop: string (nullable = true)
      * |-- transaction_code: string (nullable = true)
      */

ref - hive 語言手冊

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM