简体   繁体   English

使用Python读取AVRO文件

[英]Read AVRO file using Python

I have an AVRO file(created by JAVA) and seems like it is some kind of zipped file for hadoop/mapreduce, i want to 'unzip' (deserialize) it to a flat file. 我有一个AVRO文件(由JAVA创建),看起来它是hadoop / mapreduce的某种压缩文件,我想将其解压缩(反序列化)为平面文件。 Per record per row. 每行记录。

I learned that there is an AVRO package for python, and I installed it correctly. 我了解到有一个用于python的AVRO包 ,我正确安装了它。 And run the example to read the AVRO file. 并运行示例以读取AVRO文件。 However, it came up with the errors below and I am wondering what is going on reading the simplest example? 但是,它提出了下面的错误,我想知道阅读最简单的例子是什么? Can anyone help me interpret the errors bellow. 任何人都可以帮我解释下面的错误。

>>> reader = DataFileReader(open("/tmp/Stock_20130812104524.avro", "r"), DatumReader())
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/.../python2.7/site-packages/avro/datafile.py", line 240, in __init__
    raise DataFileException('Unknown codec: %s.' % self.codec)
avro.datafile.DataFileException: Unknown codec: snappy.

btw, if I do 'head' of file, and using VI to open up the first few lines of the AVRO file, I could see the schema definition together with some crappy weird characters - probably the zipped content. 顺便说一句,如果我做'头'的文件,并使用VI打开AVRO文件的前几行,我可以看到模式定义与一些蹩脚的奇怪字符 - 可能是压缩内容。 The starting bit of the raw AVRO file looks like below: 原始AVRO文件的起始位如下所示:

bj^A^D^Tavro.codec^Lsnappy^Vavro.schemaØ${"type":"record","name":"Stoc...

I don't know if those schemas would be necessary to read the AVRO file, something like below: 我不知道是否需要这些模式来读取AVRO文件,如下所示:

schema = avro.schema.parse(open("schema").read())
# include schema to do sth...
reader = DataFileReader(open("Stock_20130812104524.avro", "r"), DatumReader())

Thanks in advance. 提前致谢。

试试pip install python-snappy - 确保你先安装了snappy

The problem is that if there is no Xcode command line tools installed you cannot get snappy working. 问题是,如果没有安装Xcode命令行工具,你就无法正常工作。 You can check by typing gcc at the command prompt to see if it is installed or not. 您可以在命令提示符下键入gcc来检查它是否已安装。 If not then type xcode-select –-install to install it. 如果没有,则输入xcode-select –-install进行安装。 Then installing python-snappy should work. 然后安装python-snappy应该工作。 Thanks Bin! 谢谢斌!

wget http://www.us.apache.org/dist/avro/avro-1.7.5/java/avro-tools-1.7.5.jar wget http://www.us.apache.org/dist/avro/avro-1.7.5/java/avro-tools-1.7.5.jar

java -jar avro/avro-tools-1.7.5.jar tojson input.avro > input java -jar avro / avro-tools-1.7.5.jar tojson input.avro> input

More information refers here 更多信息请参考此处

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM