简体   繁体   English

CSV 到 AVRO 使用 python

[英]CSV to AVRO using python

I have the following csv:我有以下 csv:

eu;4523;35353;01/09/1999; 741 ; 386 ; 412 ; 86 ; 1.624 ; 1.038 ; 469 ; 117 ;

and I want to convert it to avro.我想把它转换成 avro。 I have created the following avro schema:我创建了以下 avro 架构:

{"namespace": "forecast.avro",
 "type": "record",
 "name": "forecast",
 "fields": [
     {"name": "field1", "type": "string"},
     {"name": "field2", "type": "string"},
     {"name": "field3", "type": "string"},
     {"name": "field4", "type": "string"},
     {"name": "field5", "type": "string"},
     {"name": "field6", "type": "string"},
     {"name": "field7", "type": "string"},
     {"name": "field8", "type": "string"},
     {"name": "field9", "type": "string"},
     {"name": "field10", "type": "string"},
     {"name": "field11", "type": "string"},
     {"name": "field12", "type": "null"}

and my code is the next one:我的代码是下一个:

import avro.schema
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
import csv
from collections import namedtuple

FORECAST = "forecast.csv"
fields = ("field1", "field2", "field3", "field4", "field5", "field6", "field7", "field8", "field9", "field10", "field11", "field12")
forecastRecord = namedtuple('forecastRecord', fields)

def read_forecast_data(path):
    with open(path, 'rU') as data:
        reader = csv.reader(data, delimiter = ";")
        for row in map(forecastRecord._make, reader):
            yield row

if __name__=="__main__":
    for row in read_forecast_data(FORECAST):
        print (row)

def parse_schema(path="forecast.avsc"):
    with open(path, 'r') as data:
        return avro.schema.parse(data.read())
def serialize_records(records, outpath="forecast.avro"):
    schema = parse_schema()
    with open(outpath, 'w') as out:
        writer = DataFileWriter(out, DatumWriter(), schema)
        for record in records:
            record = dict((f, getattr(record, f)) for f in record._fields)
if __name__ == "__main__":

When I run the code i get the error that the datum is not an example of the current schema.当我运行代码时,我收到错误消息,指出数据不是当前模式的示例。 I have checked again and again my schema to find any inconsistencies, but till now I have not managed to find any.我一次又一次地检查我的架构以发现任何不一致之处,但直到现在我还没有找到任何不一致之处。 Could someone help me?有人可以帮我吗?

When I run your code as written I get an error TypeError: Expected 12 arguments, got 13 at for row in map(forecastRecord._make, reader): because your CSV ends in a ; 当我按照编写的方式运行你的代码时,我得到一个错误TypeError: Expected 12 arguments, got 13 for row in map(forecastRecord._make, reader):for row in map(forecastRecord._make, reader): TypeError: Expected 12 arguments, got 13 for row in map(forecastRecord._make, reader):因为你的CSV以a结尾; and therefore has 13 fields. 因此有13个领域。

Once I remove those trailing ; 一旦我删除了那些尾随; s, I can run the example and get the same error about the schema mismatch. s,我可以运行该示例并获得有关模式不匹配的相同错误。 The reason is that field12 in your schema is defined as a type of null but in the data it is a string type (with value "117" ). 原因是模式中的field12被定义为null类型,但在数据中它是一个string类型(值为"117" )。

If you change the avsc file to {"name": "field12", "type": "string"} then it works. 如果您将avsc文件更改为{"name": "field12", "type": "string"}那么它可以正常工作。

One more way in example:示例中的另一种方式:

    import csv
    from collections import namedtuple
    from fastavro import parse_schema, writer

    schema = {
              "namespace": "test.avro",
              "type": "record",
              "name": "test",
              "fields": [
                         {"name": "region", "type": "string"},
                         {"name": "anzsic_descriptor", "type": "string"},
                         {"name": "gas", "type": "string"},
                         {"name": "units", "type": "string"},
                         {"name": "magnitude", "type": "string"},
                         {"name": "year", "type": "string"},
                         {"name": "data_val", "type": "string"}
fields = 
forecastRecord = namedtuple('forecastRecord', fields)
parsed_schema = parse_schema(schema)

lst = []
with open('test.csv', 'r') as data:
    reader = csv.reader(data, delimiter=",")
    for records in map(forecastRecord._make, reader):
        record = dict((f, getattr(records, f)) for f in records._fields)

with open("users.avro", "wb") as fp:
    writer(fp, schema, l)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM