简体   繁体   中英

Hive queries in spark sql

I can use any type of sql queries in spark.sql but while applying spark.sql in below queries its getting error. (cstone_feed_key|cstone_last_updatetm|rte_call_key_seq_no)?+.+ sysntax mainly used to exclude the fields from the selection in hive. kindly suggest some way to do the same. I have around 1000 fields in the table.

select rte_call_key_seq_no as T_rte_call_key_seq_no, (cstone_feed_key|cstone_last_updatetm|rte_call_key_seq_no)?+.+ from table

update-1

spark.sql("SET spark.sql.parser.quotedRegexColumnNames=true")
    df.createOrReplaceTempView("table")
    spark.sql("select `(account_id|credit_card_limit)?+.+` from table")
      .printSchema()

    /**
      * root
      * |-- credit_card_Number: long (nullable = true)
      * |-- first_name: string (nullable = true)
      * |-- last_name: string (nullable = true)
      * |-- phone_number: integer (nullable = true)
      * |-- amount: integer (nullable = true)
      * |-- date: string (nullable = true)
      * |-- shop: string (nullable = true)
      * |-- transaction_code: string (nullable = true)
      */

original answer alternative-

hive syntax

  1. The following query selects all columns except ds and hr.
  2. Java regex syntax
SELECT `(ds|hr)?+.+` FROM sales

Spark Syntax

python

df = spark.createDataFrame([("a", 1), ("b", 2), ("c",  3)], ["Col1", "Col2"])
df.select(df.colRegex("`(Col1)?+.+`")).show()
+----+
|Col2|
+----+
|   1|
|   2|
|   3|
+----+

scala

df.select(df.colRegex("`(account_id|credit_card_limit)?+.+`"))
      .printSchema()

    /**
      * root
      * |-- credit_card_Number: long (nullable = true)
      * |-- first_name: string (nullable = true)
      * |-- last_name: string (nullable = true)
      * |-- phone_number: integer (nullable = true)
      * |-- amount: integer (nullable = true)
      * |-- date: string (nullable = true)
      * |-- shop: string (nullable = true)
      * |-- transaction_code: string (nullable = true)
      */

Another Approach

df.printSchema()

    /**
      * root
      * |-- account_id: integer (nullable = true)
      * |-- credit_card_Number: long (nullable = true)
      * |-- credit_card_limit: integer (nullable = true)
      * |-- first_name: string (nullable = true)
      * |-- last_name: string (nullable = true)
      * |-- phone_number: integer (nullable = true)
      * |-- amount: integer (nullable = true)
      * |-- date: string (nullable = true)
      * |-- shop: string (nullable = true)
      * |-- transaction_code: string (nullable = true)
      */

    // hive syntax
    // The following query selects all columns except ds and hr.
    // SELECT `(ds|hr)?+.+` FROM sales
    // Java regex syntax

    df.selectExpr(df.columns.filter(_.matches("(account_id|credit_card_limit)?+.+")): _*)
      .printSchema()

    /**
      * root
      * |-- credit_card_Number: long (nullable = true)
      * |-- first_name: string (nullable = true)
      * |-- last_name: string (nullable = true)
      * |-- phone_number: integer (nullable = true)
      * |-- amount: integer (nullable = true)
      * |-- date: string (nullable = true)
      * |-- shop: string (nullable = true)
      * |-- transaction_code: string (nullable = true)
      */

ref - hive LanguageManual

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM