Spark 无法识别 SQL 查询中的列名，而 output 可以将其识别到数据集

Question

I'm applying the SQL query like that:我正在应用这样的 SQL 查询：

s"SELECT *  FROM my_table_joined WHERE (timestamp > '2022-01-23' and writetime is not null and acceptTimestamp is not null)"

and I'm getting the error message like that.我收到这样的错误消息。

warning: there was one deprecation warning (since 2.0.0); for details, enable `:setting -deprecation' or `:replay -deprecation'
org.postgresql.util.PSQLException: ERROR: column "accepttimestamp" does not exist
  Hint: Perhaps you meant to reference the column "mf_joined.acceptTimestamp".
  Position: 103
  at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2497)
  at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2233)
  at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:310)
  at org.postgresql.jdbc.PgStatement.executeInternal(PgStatement.java:446)
  at org.postgresql.jdbc.PgStatement.execute(PgStatement.java:370)
  at org.postgresql.jdbc.PgPreparedStatement.executeWithFlags(PgPreparedStatement.java:149)
  at org.postgresql.jdbc.PgPreparedStatement.executeQuery(PgPreparedStatement.java:108)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRDD$.resolveTable(JDBCRDD.scala:61)
  at org.apache.spark.sql.execution.datasources.jdbc.JDBCRelation$.getSchema(JDBCRelation.scala:226)
  at org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:35)
  at org.apache.spark.sql.execution.datasources.DataSource.resolveRelation(DataSource.scala:344)
  at org.apache.spark.sql.DataFrameReader.loadV1Source(DataFrameReader.scala:297)
  at org.apache.spark.sql.DataFrameReader.$anonfun$load$2(DataFrameReader.scala:286)
  at scala.Option.getOrElse(Option.scala:189)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:286)
  at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:221)
  at $$$e76229fa87b6865de321c5274e52c2f9$$$$w$getDFFromJdbcSource(<console>:1133)
  ... 326 elided

If I omit acceptTimestamp like that:如果我像这样省略acceptTimestamp ：

s"SELECT *  FROM my_table_joined WHERE (timestamp > '2022-01-23' and writetime is not null)"

I'm getting the data as below:我得到的数据如下：

+-------------------+----------+----+------------------+-----------------+---+-----+------+----------+---------------+-------+-----------------------+----------+---------+-------------+------------+---------------+---------+-----+-------------------+-----------------------+---------------+--------------+-------------+-------------------+-------------------+---+---+------------------+-----+----+----+------------------+---+
|timestamp          |flags     |type|lon               |lat              |alt|speed|course|satellites|digital_twin_id|unit_id|unit_ts                |name      |unit_type|measure_units|access_level|uid            |placement|stale|start              |writetime              |acceptTimestamp|delayWindowEnd|DiffInSeconds|time               |hour               |max|min|mean              |count|max2|min2|mean2             |rnb|
+-------------------+----------+----+------------------+-----------------+---+-----+------+----------+---------------+-------+-----------------------+----------+---------+-------------+------------+---------------+---------+-----+-------------------+-----------------------+---------------+--------------+-------------+-------------------+-------------------+---+---+------------------+-----+----+----+------------------+---+

please note acceptTimestamp is here!请注意acceptTimestamp在这里！

So how I should handle this column in my query to make it taken into account?那么我应该如何处理我的查询中的这一列以使其考虑在内？

Answer 1

From the exception, it seems this is related to Postgres not Spark.从异常来看，这似乎与 Postgres 而不是 Spark 有关。 If you look at the error message you got, the column name is folded to lowercase accepttimestamp whereas in your query the T is in uppercase acceptTimestamp .如果您查看收到的错误消息，列名将折叠为小写的accepttimestamp ，而在您的查询中T为大写的acceptTimestamp 。

To make the column name case-sensitive for Postgres, you need to use double-quotes.要使 Postgres 的列名区分大小写，您需要使用双引号。 Try this:尝试这个：

val query = s"""SELECT * FROM my_table_joined 
    WHERE   timestamp > '2022-01-23' 
    and     writetime is not null 
    and     "acceptTimestamp" is not null"""

Spark 无法识别 SQL 查询中的列名，而 output 可以将其识别到数据集

问题描述

1 个解决方案

解决方案1
0 2022-01-23 12:02:02

Spark 无法识别 SQL 查询中的列名，而 output 可以将其识别到数据集

问题描述

1 个解决方案

解决方案1 0 2022-01-23 12:02:02

解决方案1
0 2022-01-23 12:02:02