简体   繁体   中英

AWS Athena insert into with named columns not working in pyspark

I've created a little test table using pyspark

query="""
CREATE EXTERNAL TABLE IF NOT EXISTS test1
(
c1 INT,
c2 INT,
c3 INT
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde'
LOCATION 's3://mybucket/myfolder/'
"""
spark.sql(query)

And this works fine, producing the following output

spark.sql("select * from test1").show()

+---+---+---+
| c1| c2| c3|
+---+---+---+
|  1|  2|  3|
|  4|  5|  6|
+---+---+---+

My problem is trying to do an insert now. According to my reading of the Athena documentation I should be able to do the following but I'm getting an error message

query="""
insert into test1(c1,c2,c3) select c1,c2,c3 from test1
"""
spark.sql(query)


"\nmismatched input 'c1' expecting {'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 2, pos 21)\n\n== SQL ==\n\ninsert into test1(c1,c2,c3) select c1,c2,c3 from test1\n---------------------^^^\n"
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 767, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 73, in deco
    raise ParseException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.ParseException: "\nmismatched input 'c1' expecting {'(', 'SELECT', 'FROM', 'VALUES', 'TABLE', 'INSERT', 'MAP', 'REDUCE'}(line 2, pos 21)\n\n== SQL ==\n\ninsert into test1(c1,c2,c3) select c1,c2,c3 from test1\n---------------------^^^\n"

However the following INSERT works as expected

query="""
insert into test1 select c1,c2,c3 from test1
"""

spark.sql(query)

If anyone can see what I'm doing wrong it would be appreciated

As per AWS documentation, you don't need to pass the column names alongwith the destination table. The correct query would be:

insert into test1 select c1,c2,c3 from test1

Reference: Athena insert into documentation

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM