I am trying to use the Python Avro library ( https://pypi.python.org/pypi/avro ) to read a AVRO file generated by JAVA. Since the schema is already embedded in the avro file, why do I need to specify a schema file? Is there a way to extract it automatically?
Found another package called fastavro( https://pypi.python.org/pypi/fastavro ) can extract avro schema. Is the manual specifying schema file in python arvo package by design? Thank you very much.
I use python 3.4 and Avro package 1.7.7
For schema file use:
reader = avro.datafile.DataFileReader(open('file_name.avro',"rb"),avro.io.DatumReader())
schema = reader.meta
print(schema)
A direct examination of /usr/local/lib/python2.7/site-packages/avro/datafile.py
reveals the answer:
reader = avro.datafile.DataFileReader(input,avro.io.DatumReader())
schema = reader.datum_reader.writers_schema
print schema
Curiously, in Java there is a special method for that: reader.getSchema()
.
In my case in order to get the schema as a "consumable" python dictionary containing useful info such schema name and so on I did the following:
reader: DataFileReader = DataFileReader(open(avro_file, 'rb'), DatumReader())
schema: dict = json.loads(reader.meta.get('avro.schema').decode('utf-8'))
The reader.meta
is a dictionary pretty useless "as is", since it contains 2 keys: avro.codec
and avro.schema
that are both bytes
objects (so I had to parse it in order to access to properties).
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.