Spark 無法識別 SQL 查詢中的列名，而 output 可以將其識別到數據集

Question

我正在應用這樣的 SQL 查詢：

s"SELECT *  FROM my_table_joined WHERE (timestamp > '2022-01-23' and writetime is not null and acceptTimestamp is not null)"

我收到這樣的錯誤消息。

warning: there was one deprecation warning (since 2.0.0); for details, enable `:setting -deprecation' or `:replay -deprecation'
org.postgresql.util.PSQLException: ERROR: column "accepttimestamp" does not exist
  Hint: Perhaps you meant to reference the column "mf_joined.acceptTimestamp".
  Position: 103
  at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2497)
  at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2233)
  at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:310)
  at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:446)
  at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:370)
  at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:149)
  at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:108)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:61)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
  at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:286)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:221)
  at $$$e76229fa87b6865de321c5274e52c2f9$$$$w$getDFFromJdbcSource(<console>:1133)
  ... 326 elided

如果我像這樣省略acceptTimestamp ：

s"SELECT *  FROM my_table_joined WHERE (timestamp > '2022-01-23' and writetime is not null)"

我得到的數據如下：

+-------------------+----------+----+------------------+-----------------+---+-----+------+----------+---------------+-------+-----------------------+----------+---------+-------------+------------+---------------+---------+-----+-------------------+-----------------------+---------------+--------------+-------------+-------------------+-------------------+---+---+------------------+-----+----+----+------------------+---+
|timestamp          |flags     |type|lon               |lat              |alt|speed|course|satellites|digital_twin_id|unit_id|unit_ts                |name      |unit_type|measure_units|access_level|uid            |placement|stale|start              |writetime              |acceptTimestamp|delayWindowEnd|DiffInSeconds|time               |hour               |max|min|mean              |count|max2|min2|mean2             |rnb|
+-------------------+----------+----+------------------+-----------------+---+-----+------+----------+---------------+-------+-----------------------+----------+---------+-------------+------------+---------------+---------+-----+-------------------+-----------------------+---------------+--------------+-------------+-------------------+-------------------+---+---+------------------+-----+----+----+------------------+---+

請注意acceptTimestamp在這里！

那么我應該如何處理我的查詢中的這一列以使其考慮在內？

Answer 1

從異常來看，這似乎與 Postgres 而不是 Spark 有關。 如果您查看收到的錯誤消息，列名將折疊為小寫的accepttimestamp ，而在您的查詢中T為大寫的acceptTimestamp 。

要使 Postgres 的列名區分大小寫，您需要使用雙引號。 嘗試這個：

val query = s"""SELECT * FROM my_table_joined 
    WHERE   timestamp > '2022-01-23' 
    and     writetime is not null 
    and     "acceptTimestamp" is not null"""

Spark 無法識別 SQL 查詢中的列名，而 output 可以將其識別到數據集

問題描述

1 個解決方案

解決方案1
0 2022-01-23 12:02:02

Spark 無法識別 SQL 查詢中的列名，而 output 可以將其識別到數據集

問題描述

1 個解決方案

解決方案1 0 2022-01-23 12:02:02

解決方案1
0 2022-01-23 12:02:02