简体   繁体   English

AWS Athena 插入命名列在 pyspark 中不起作用

[英]AWS Athena insert into with named columns not working in pyspark

I've created a little test table using pyspark我使用 pyspark 创建了一个小测试表

query="""
CREATE EXTERNAL TABLE IF NOT EXISTS test1
(
c1 INT,
c2 INT,
c3 INT
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
LOCATION 's3://mybucket/myfolder/'
"""
spark.sql(query)

And this works fine, producing the following output这工作正常,产生以下 output

spark.sql("select * from test1").show()

+---+---+---+
| c1| c2| c3|
+---+---+---+
|  1|  2|  3|
|  4|  5|  6|
+---+---+---+

My problem is trying to do an insert now.我的问题是现在尝试插入。 According to my reading of the Athena documentation I should be able to do the following but I'm getting an error message根据我对 Athena 文档的阅读,我应该能够执行以下操作,但我收到一条错误消息

query="""
insert into test1(c1,c2,c3) select c1,c2,c3 from test1
"""
spark.sql(query)


"\nmismatched input 'c1' expecting {'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 2, pos 21)\n\n== SQL ==\n\ninsert into test1(c1,c2,c3) select c1,c2,c3 from test1\n---------------------^^^\n"
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 767, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 73, in deco
    raise ParseException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.ParseException: "\nmismatched input 'c1' expecting {'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 2, pos 21)\n\n== SQL ==\n\ninsert into test1(c1,c2,c3) select c1,c2,c3 from test1\n---------------------^^^\n"

However the following INSERT works as expected但是,以下 INSERT 按预期工作

query="""
insert into test1 select c1,c2,c3 from test1
"""

spark.sql(query)

If anyone can see what I'm doing wrong it would be appreciated如果有人能看到我做错了什么,将不胜感激

As per AWS documentation, you don't need to pass the column names alongwith the destination table.根据 AWS 文档,您不需要将列名与目标表一起传递。 The correct query would be:正确的查询是:

insert into test1 select c1,c2,c3 from test1

Reference: Athena insert into documentation参考: Athena 插入文档

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM