简体   繁体   中英

Snowflake - how to read metadata from parquet files in S3

We are using external tables in our Snowflake database, in order to read data from some AWS S3 buckets. The buckets contain various parquet files, spread over multiple partitions.

We are able to read the data from our external table by using Snowflake's stages , storage integrations and file formats .

However, we'd like to read some metadata from the parquet files as well, such as the precision of numeric data types (eg, to find out how many decimal places we have to deal with).

To keep it simple, let's say we're reading data from one single parquet file.

Is there any way to retrieve metadata from that parquet file as to the precision of numeric data types, directly from Snowflake?

Or would you rather extract that metadata from, let's say, Glue Catalog or any other external tool?

There's a recent public preview that infers schema that will do this:

INFER_SCHEMA(
  LOCATION => '{ internalStage | externalStage }'
  , FILE_FORMAT => '<format_name>'
)

https://docs.snowflake.com/en/sql-reference/functions/infer_schema.html

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM