简体   繁体   中英

How to convert JSON Data to Avro format using Python

I would like to convert the below JSON data to avro format, I used the below code snippet to write the JSON data in avro format but received an error. If anyone can help with this, it would be really great.

from fastavro import writer, reader, schema
from rec_avro import to_rec_avro_destructive, from_rec_avro_destructive, rec_avro_schema

def getweatherdata():
    url = 'https://api.openweathermap.org/data/2.5/onecall?lat=33.441792&lon=-94.037689&exclude=hourly,daily&appid=' + apikey
    response = requests.get(url)
    data = response.text
    return data
 
def turntoavro():
    avro_objects = (to_rec_avro_destructive(rec) for rec in getweatherdata())
    with open('json_in_avro.avro', 'wb') as f_out:
        writer(f_out, schema.parse_schema(rec_avro_schema()), avro_objects)



turntoavro()

    Error details:
    
      File "fastavro/_write.pyx", line 269, in fastavro._write.write_record
    TypeError: Expected dict, got str
    
    During handling of the above exception, another exception occurred:
    
    Traceback (most recent call last):
      File "datalake.py", line 30, in <module>
        turntoavro()
      File "datalake.py", line 26, in turntoavro
        writer(f_out, schema.parse_schema(rec_avro_schema()), avro_objects)
      File "fastavro/_write.pyx", line 652, in fastavro._write.writer
      File "fastavro/_write.pyx", line 605, in fastavro._write.Writer.write
      File "fastavro/_write.pyx", line 341, in fastavro._write.write_data
      File "fastavro/_write.pyx", line 278, in fastavro._write.write_record
    AttributeError: 'str' object has no attribute 'get'

Sample Data:

    {
      "lat": 33.44,
      "lon": -94.04,
      "timezone": "America/Chicago",
      "timezone_offset": -18000

   }

To retrieve the response to the request you made, you used response.text which returns the response as a string and not in JSON format. You have to use response.json() instead to have it in JSON format:

import json    
def getweatherdata():
    url = 'https://api.openweathermap.org/data/2.5/onecall?lat=33.441792&lon=-94.037689&exclude=hourly,daily&appid=' + apikey
    response = requests.get(url)
    data = response.json()
    return data
     
def turntoavro():
    avro_objects = (to_rec_avro_destructive(rec) for rec in getweatherdata())
    with open('json_in_avro.avro', 'wb') as f_out:
        writer(f_out, schema.parse_schema(rec_avro_schema()), avro_objects)
    
    
    
turntoavro()

As mentioned in one of the answers, you probably want to use response.json() rather than response.text so that you get back an actual JSON dictionary.

However, the other problem is that getweatherdata() returns a single dictionary so when you do avro_objects = (to_rec_avro_destructive(rec) for rec in getweatherdata()) you are iterating over the keys in that dictionary. Instead you should do avro_objects = [to_rec_avro_destructive(getweatherdata())]

I believe this code should work for you:

from fastavro import writer, reader, schema
from rec_avro import to_rec_avro_destructive, from_rec_avro_destructive, rec_avro_schema

def getweatherdata():
    url = 'https://api.openweathermap.org/data/2.5/onecall?lat=33.441792&lon=-94.037689&exclude=hourly,daily&appid=' + apikey
    response = requests.get(url)
    data = response.json()
    return data
 
def turntoavro():
    avro_objects = [to_rec_avro_destructive(getweatherdata())]
    with open('json_in_avro.avro', 'wb') as f_out:
        writer(f_out, schema.parse_schema(rec_avro_schema()), avro_objects)

turntoavro()

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM