简体   繁体   中英

Athena DDL TBLPROPERTIES Necessary?

I'm interested in creating an Athena table using DDL. I'm not familiar with the process, so I checked for other tables in which this was done by selecting "generate table ddl" in Athena.

I'm curious whether the TBLPROPERTIES part is just made for and whether it's necessary? Is there a list of available TABLPROPERTIES somewhere? I suspect these are "custom" because of this: 'UPDATED_BY_CRAWLER'='aws-glue-crawler'. This table is not updated by a crawler... it was created by DDL.

CREATE EXTERNAL TABLE `tablename`(
  `nmbr` string, 
  `birth_dt` date, 
  `amt` decimal(13,2), 
  `gender` string)
PARTITIONED BY ( 
  `year` string, 
  `month` string, 
  `day` string)
ROW FORMAT SERDE 
  'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' 
STORED AS INPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetInputFormat' 
OUTPUTFORMAT 
  'org.apache.hadoop.hive.ql.io.parquet.MapredParquetOutputFormat'
LOCATION
  's3://bucketname/table'
TBLPROPERTIES (
  'CrawlerSchemaDeserializerVersion'='1.0', 
  'CrawlerSchemaSerializerVersion'='1.0', 
  'UPDATED_BY_CRAWLER'='aws-glue-crawler', 
  'UpdatedByJobRun'='39485760-394h-1928-jg35-192hgft3dd8f', 
  'averageRecordSize'='333', 
  'classification'='parquet', 
  'compressionType'='none', 
  'objectCount'='348', 
  'recordCount'='795', 
  'sizeKey'='1160441', 
  'transient_lastDdlTime'='1637268831', 
  'typeOfData'='file', 
  'useGlueParquetWriter'='true')

Thank you!

TBLPROPERTIES is used to specify metadata for the table. Allowed predefined properties are as follows:

Predefined Property Description
classification Indicates the data type for AWS Glue. Possible values are csv, parquet, orc, avro, or json. For more information, see the TBLPROPERTIES section of CREATE TABLE.
has_encrypted_data Indicates whether the dataset specified by LOCATION is encrypted. For more information, see the TBLPROPERTIES section of CREATE TABLE and Creating Tables Based on Encrypted Datasets in Amazon S3.
orc.compress Specifies a compression format for data in ORC format. For more information, see ORC SerDe.
parquet.compression Specifies a compression format for data in Parquet format. For more information, see Parquet SerDe.
write.compression Specifies a compression format for data in the textfile or JSON formats. For the Parquet and ORC formats, use the parquet.compression and orc.compress properties respectively.
projection.* Custom properties used in partition projection that allow Athena to know what partition patterns to expect when it runs a query on a table. For more information, see Partition Projection with Amazon Athena.
skip.header.line.count Ignores headers in data when you define a table. For more information, see Ignoring Headers.
storage.location.template Specifies a custom Amazon S3 path template for projected partitions. For more information, see Setting up Partition Projection.

I expect the Crawler metadata properties specified above are custom properties used by AWS Glue.

Some of the properties should be set when creating the table as they indicate how the underlying data is represented, such as classification and has_encrypted_data .

https://docs.aws.amazon.com/athena/latest/ug/alter-table-set-tblproperties.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM