简体   繁体   English

使用python比较文本文件与avro文件中的数据

[英]compare data in text file vs avro file using python

I am new to Python so please bear with me. 我是Python新手,请耐心等待。 I am using Python3.6.4 and I want to compare data in a text file Vs data in my Avro Dataset using Python. 我正在使用Python3.6.4,我想使用Python比较文本文件中的数据与我的Avro数据集中的数据。 The data in my text file will be pipe delimited and would be coming from a table from a Relational database. 我的文本文件中的数据将以竖线分隔,并且将来自关系数据库中的表。 Please help. 请帮忙。 Thanks in advance. 提前致谢。

Thanks Vikas. 谢谢维卡斯。 Here's the code below. 这是下面的代码。 Here I have hardcoded data being appended to the avro file and this is easy to compare. 在这里,我将硬编码的数据附加到了avro文件中,这很容易比较。 But my actual avro file output would be an output from a program and the text file would be an output from another. 但是我实际的avro文件输出将是程序的输出,而文本文件将是另一个程序的输出。 I'd have to compare those files. 我必须比较那些文件。 Thanks 谢谢

import avro
import avro.schema
import avro.datafile
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter

def writeAvro(fileName):
    schema = avro.schema.Parse(open("testSchema.avsc", "rb").read())

    writer = DataFileWriter(open("{}".format(fileName), "wb"), DatumWriter(), schema)
    writer.append({"id": 1, "name" : "John", "age": 34})
    writer.append({"id": 2, "name" : "Jane", "age": 134})
    writer.append({"id": 3, "name" : "Davis"})
    writer.close()

def readAvro(fileName):
    reader = DataFileReader(open("{}".format(fileName), "rb"), DatumReader())
    for record in reader:
        #print(record.get('name'))
        dict_name = record.get('name')
        dict_id = record.get('id')
        for p in expected:
            if p['name'] == dict_name:
                print(p)
    reader.close()

expected = [{'name': 'John', 'id': 1, 'age': 34},
{'id': 2, 'name': 'Jane', 'age': 134},
{'id': 3, 'name': 'Davis', 'age': None}]

#print(expected)

writeAvro("test.avro")
readAvro("test.avro")

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM