简体   繁体   中英

How to read json gzipped file from GCS and write to table

I have a json compress with gzip file (.json.gz) stored in bucket in Google Cloud Storage in which I want to read it and copy it into a postgres table. The json.gz file I have is just a json file with no nested object in it like this:

[{
“date”: “2019-03-10T07:00:00.000Z”,
“type”: “chair”,
“total”: 250.0,
"payment": "cash"
},{
“date”: “2019-03-10T07:00:00.000Z”,
“type”: “shirt”,
“total”: 100.0,
"payment": "credit card"
},{
.
.
}]

Previously I did similar job like this with csv file in which I can use download_as_string function and stored it in variable and use StringIO to convert that variable to file-like object and used copy_expert() function with the query ( this link )

So, how can I read a json.gz file in GCS and write it to a table with Python?

Thank you

To read the data in, I'd go with gcsfs , Python interface to GCS:

import gcsfs
import gzip
import json

fs = gcsfs.GCSFileSystem(project='my-project')
with fs.open('bucket/path.json.gz') as f:
    gz = gzip.GzipFile(fileobj=f) 
    file_as_string = gz.read()
    your_json = json.loads(file_as_string)

Now that you have your json, you can use the same code as you were using with csv.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM