How to read a Parquet file's metadata (column names with types) from IBM COS in Python?
The only way I have found:
import pyarrow.parquet as pq
import s3fs
s3 = s3fs.S3FileSystem(anon=False, key='xxx', secret='xxx',
client_kwargs={'endpoint_url':
"https://s3-api.us-geo.objectstorage.softlayer.net"}
schema = pq.ParquetDataset("bucket_name/file", filesystem=s3).read().schema
But it reads the whole file (I think).
May be there is another approach to get the metadata from the Parquet file located in IBM COS?
If I use
schema = pq.ParquetDataset("bucket_name/file", filesystem=s3).schema
It returns different data types. For Strings: BYTE_ARRAY
and for Timestamp: INT96
Strange...
解决了:
schema = pq.ParquetDataset(bucket, filesystem=s3).schema.to_arrow_schema()
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.