简体   繁体   中英

From gzip to json to dataframe to csv

I am trying to get some data from an open API:

https://data.brreg.no/enhetsregisteret/api/enheter/lastned 

but I am having difficulties understanding the different type of objects and the order the conversions should be in. Is it strings to bytes , is it BytesIO or StringIO , is it decode('utf-8) or decode('unicode) etc..?

So far:

url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'


with urllib.request.urlopen(url_get) as response:
    encoding = response.info().get_param('charset', 'utf8')
    compressed_file = io.BytesIO(response.read())
    decompressed_file = gzip.GzipFile(fileobj=compressed_file)

and now is where I am stuck, how should I write the next line of code?

json_str = json.loads(decompressed_file.read().decode('utf-8'))

My workaround is if I write it as a json file then read it in again and do the transformation to df then it works:

with io.open('brreg.json', 'wb') as f:
    f.write(decompressed_file.read())

with open(f_path, encoding='utf-8') as fin:
    d = json.load(fin)

df = json_normalize(d)

with open('brreg_2.csv', 'w', encoding='utf-8', newline='') as fout:
    fout.write(df.to_csv())

I found many SO posts about it, but I am still so confused. This first one explains it quite good, but I still need some spoon feeding.

Python 3, read/write compressed json objects from/to gzip file

TypeError when trying to convert Python 2.7 code to Python 3.4 code

How can I create a GzipFile instance from the “file-like object” that urllib.urlopen() returns?

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

It works fine for me using the decompress function rather than the GZipFile class to decompress the file, but not sure why yet...

import urllib.request
import gzip
import io
import json

url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'


with urllib.request.urlopen(url_get) as response:
    encoding = response.info().get_param('charset', 'utf8')
    compressed_file = io.BytesIO(response.read())
    decompressed_file = gzip.decompress(compressed_file.read())
    json_str = json.loads(decompressed_file.decode('utf-8'))

EDIT , in fact the following also works fine for me which appears to be your exact code... ( Further edit , turns out it's not quite your exact code because your final line was outside the with block which meant response was no longer open when it was needed - see comment thread)

import urllib.request
import gzip
import io
import json

url_get = 'https://data.brreg.no/enhetsregisteret/api/enheter/lastned'


with urllib.request.urlopen(url_get) as response:
    encoding = response.info().get_param('charset', 'utf8')
    compressed_file = io.BytesIO(response.read())
    decompressed_file = gzip.GzipFile(fileobj=compressed_file)
    json_str = json.loads(decompressed_file.read().decode('utf-8'))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM