简体   繁体   English

带有冒号 (:) 的 Amazon Athena 分区不工作

[英]Amazon Athena partition with colon(:) is not working

When creating partition in Athena, I tried to use the date in the format (yyyy-MM-ddTHH:mm:ssZ) then I am not able to query the data在 Athena 中创建分区时,我尝试使用 (yyyy-MM-ddTHH:mm:ssZ) 格式的日期,然后我无法查询数据

Step 1: Create table第一步:创建表

CREATE EXTERNAL TABLE my_info (
         id STRING,
         name STRING
) PARTITIONED BY (
        part string
) STORED AS ORC LOCATION 's3://bucket1/data' tblproperties ("orc.compress"="SNAPPY"); 

Step 2: Create folder like below and added the files.第 2 步:创建如下文件夹并添加文件。

S3://bucket1/data/part=2019-11-12T14:15:16Z

Step 3: Refresh partition MSCK REPAIR TABLE my_info第三步:刷新分区 MSCK REPAIR TABLE my_info

Step 4: Query the data SELECT * FROM my_info第四步:查询数据 SELECT * FROM my_info

With this I am not able to query any data有了这个我无法查询任何数据

If I change the folder to format (yyyy-MM-ddTHH)如果我将文件夹更改为格式 (yyyy-MM-ddTHH)

without ':' in Step 2在第 2 步中没有“:”

s3://bucket1/data/part=2019-11-12T14

Then I am able to get the results.然后我就能得到结果。

Any idea about why this is not working.关于为什么这不起作用的任何想法。

This is because when you create the partitioned table the partitioning is implemented as part of the S3 path eg for s3://bucket1/data/part=2019-11-12T14:15:16Z the part=2019-11-12T14:15:16Z section is an S3 path that Athena interprets as a partition when querying the data.这是因为当您创建分区表时,分区是作为 S3 路径的一部分实现的,例如对于s3://bucket1/data/part=2019-11-12T14:15:16Z part=2019-11-12T14:15:16Z部分是 Athena 在查询数据时将其解释为分区的 S3 路径。

S3 path names have some restrictions on the characters that can be used : S3 路径名对可以使用的字符有一些限制

The following characters in a key name might require additional code handling and likely need to be URL encoded or referenced as HEX.键名中的以下字符可能需要额外的代码处理,并且可能需要进行 URL 编码或引用为 HEX。 Some of these are non-printable characters and your browser might not handle them, which also requires special handling:其中一些是不可打印的字符,您的浏览器可能无法处理它们,这也需要特殊处理:

Ampersand ("&")  
Dollar ("$")  
ASCII character ranges 00–1F hex (0–31 decimal) and 7F (127 decimal)  
'At' symbol ("@")  
Equals ("=")  
Semicolon (";")  
Colon (":")  
Plus ("+")  
Space – Significant sequences of spaces may be lost in some uses (especially multiple spaces)  
Comma (",")  
Question mark ("?")  

In this case it's probably the colons in the path that are not being interpreted by Presto/Athena.在这种情况下,Presto/Athena 可能没有解释路径中的冒号。 To work around this you can use an alternative dividing character in the timestamp eg part=2019-11-12--14-15-16 or omit it altogether.要解决此问题,您可以在时间戳中使用替代分隔字符,例如part=2019-11-12--14-15-16或完全省略它。

It seems you can use an URL encoded colon (%3A).看来您可以使用 URL 编码的冒号 (%3A)。

Further, if you which to use timestamp as the partition type instead of string, make sure to use a "java.sql.Timestamp compatible format " as documented for the CREATE TABLE statement.此外,如果您使用timestamp而不是字符串作为分区类型,请确保使用 CREATE TABLE 语句中 记录的“java.sql.Timestamp 兼容格式”。

So the final url would be s3://bucket1/data/part=2019-11-12 14%3A15%3A16/ .所以最终的 url 将是s3://bucket1/data/part=2019-11-12 14%3A15%3A16/

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM