简体   繁体   English

即使在 escaping 带有反引号(`)的点之后,在 Amazon Athena 中创建列名包含点 (.) 的表时也会出错

[英]Error in creating table with column name containing dot (.) in Amazon Athena even after escaping the dot with backticks(`)

As per https://docs.aws.amazon.com/athena/latest/ug/tables-databases-columns-names.html ,根据https://docs.aws.amazon.com/athena/latest/ug/tables-databases-columns-names.html

Special characters特殊字符

Special characters other than underscore (_) are not supported.不支持下划线 (_) 以外的特殊字符。 For more information, see the Apache Hive LanguageManual DDL documentation.有关详细信息,请参阅 Apache Hive 语言手册 DDL 文档。

Important重要的

Although you may succeed in creating table, view, database, or column names that contain special characters other than underscore by enclosing them in backtick (`) characters, subsequent DDL or DML queries that reference them can fail.尽管您可以成功地创建包含除下划线以外的特殊字符的表、视图、数据库或列名称,方法是将它们括在反引号 (`) 字符中,但引用它们的后续 DDL 或 DML 查询可能会失败。

So, I tried to create a table using JSON file stored in S3 bucket and one of the keys in JSON contains multiple dots(.), which, as per the information given on the link, should be fine is I used backticks(`) to escape it.因此,我尝试使用存储在 S3 存储桶中的 JSON 文件创建一个表,并且 JSON 中的一个键包含多个点(。),根据链接上给出的信息,应该没问题我使用了反引号(`)逃避它。

CREATE EXTERNAL TABLE json_table (
id string,
version string,
com`.`org`.`dto`.`Customer string )
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ('ignore.malformed.json' = 'true')
LOCATION 's3://narendra-damodardas-modi-test-data/';

But it is giving the following error:但它给出了以下错误:

line 1:8: no viable alternative at input 'create external' (service: amazonathena; status code: 400; error code: invalidrequestexception; request id: ef586f31-2515-4faa-a9fe-3a0e418235d2)

Now, you may say that as per the link provided, it is but obvious that it is not gonna work, but when I do this via Crawler in AWS Glue, everything works fine and I can see the column with dots in it.现在,您可能会说,根据提供的链接,很明显它不会起作用,但是当我通过 AWS Glue 中的 Crawler 执行此操作时,一切正常,我可以看到其中带有点的列。

As per https://docs.aws.amazon.com/athena/latest/ug/understanding-tables-databases-and-the-data-catalog.html ,根据https://docs.aws.amazon.com/athena/latest/ug/understanding-tables-databases-and-the-data-catalog.html

Regardless of how the tables are created, the tables creation process registers the dataset with Athena.无论表是如何创建的,表创建过程都会将数据集注册到 Athena。 This registration occurs in the AWS Glue Data Catalog and enables Athena to run queries on the data.此注册发生在 AWS Glue 数据目录中,并使 Athena 能够对数据运行查询。

So, AWS Athena is utilizing AWS Glue behind the scenes and if Glue's crawler is able to add columns containing dots(.) in the JSON key, why Athena's query is not able to do it.因此,AWS Athena 在幕后使用 AWS Glue,如果 Glue 的爬虫能够在 JSON 键中添加包含点 (.) 的列,为什么 Athena 的查询无法做到这一点。

Maybe I am missing something.也许我错过了一些东西。 So, if anyone has experienced something like this in that past and got past the problem, please enlighten me.所以,如果有人在过去经历过这样的事情并解决了这个问题,请赐教。 And if it impossible to do what I am trying to do, please highlight that as well, so that I do not keep wasting my time.如果不可能做我想做的事,也请强调这一点,这样我就不会一直浪费我的时间。

You need to use use back ticks around the whole thing and not just around special characters.您需要在整个事物周围使用反引号,而不仅仅是在特殊字符周围。 The following should work以下应该工作

CREATE EXTERNAL TABLE json_table (
  `id` string,
  `version` string,
  `com.org.dto.Customer` string
)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES (
  'ignore.malformed.json' = 'true'
)
LOCATION 's3://narendra-damodardas-modi-test-data/';

In general, I'd advise to surround all column names with back ticks.一般来说,我建议用反引号将所有列名括起来。

Also if your AWS Glue Crawler runs fine on a similar data, then you can lookup schema that it created with SHOW CREATE TABLE此外,如果您的 AWS Glue Crawler 在类似的数据上运行良好,那么您可以查找它使用SHOW CREATE TABLE创建的架构

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM