How to read Parquet file's metadata from IBM Cloud Object Storage in Python?

Question

How to read a Parquet file's metadata (column names with types) from IBM COS in Python?

The only way I have found:

           import pyarrow.parquet as pq
           import s3fs
           s3 = s3fs.S3FileSystem(anon=False, key='xxx', secret='xxx',
                   client_kwargs={'endpoint_url':
                                      "https://s3-api.us-geo.objectstorage.softlayer.net"}

           schema = pq.ParquetDataset("bucket_name/file", filesystem=s3).read().schema

But it reads the whole file (I think).

May be there is another approach to get the metadata from the Parquet file located in IBM COS?

If I use

       schema = pq.ParquetDataset("bucket_name/file", filesystem=s3).schema

It returns different data types. For Strings: BYTE_ARRAY

and for Timestamp: INT96

Strange...

Answer 1

解决了：

schema = pq.ParquetDataset(bucket, filesystem=s3).schema.to_arrow_schema()

How to read Parquet file's metadata from IBM Cloud Object Storage in Python?

Question

1 answers

solution1
0 2018-10-16 15:58:17

How to read Parquet file's metadata from IBM Cloud Object Storage in Python?

Question

1 answers

solution1 0 2018-10-16 15:58:17

solution1
0 2018-10-16 15:58:17