简体   繁体   中英

Python exception while parsing json to avro schema: avro.schema.SchemaParseException: No “type” property

I read a record from a file and convert it into a dictionary. Later I convert that dictionary to json format so that I could further try to convert it to an avro schema.

Here is my code snippet so far:-

import json
from avro import schema, datafile, io

def json_to_avro():
        fo = open("avro_record.txt", "r")
        data = fo.readlines()
        final_header = []
        final_rec = []
        for header in data[0:1]:
            header = header.strip("\n")
            header = header.split(",")
            final_header = header
        for rec in data[1:]:
            rec = rec.strip("\n")
            rec = rec.split(" ")
            rec = ' '.join(rec).split()
            final_rec = rec
        final_dict = dict(zip(final_header,final_rec))
        #print final_dict
        json_dumps = json.dumps(final_dict, ensure_ascii=False)
        #print json_dumps
        SCHEMA = schema.parse(json_dumps)

json_to_avro()

When I print final_dict, output is:-

{'TransportProtocol': 'udp', 'MSISDN': '+62696174735', 'ResponseCode': 'E6%B8    %B8%E%A%8%E%8%93&pfid=139ver=10.1.2.571title=Air%20fighter_pakage.apk', 'GGSN IP': '202.89.193.185', 'MSTimeZone': '+0008', 'Numbers of time period': '1', 'Mime Type': 'audio/aac', 'EndTime': '1462251588', 'OutBound': '709', 'Inbound': '35','Method': 'GET', 'RAT': 'ph', 'Referer': 'ghijk', 'TAC': '35893783', 'UserAgent': '961', 'MNC': '02', 'OutPayload': '0', 'CI': '34301', 'StartTime': '1462251588', 'DestinationIP':'ef50:5fcd:498e:c265:a37b:10ec:7984:c6a3', 'URL': 'http:///group1/M00/6F/B2/poYBAFYtlqiALni4AG51LNrVFEQ342.apk?pn=com.airfly.fightergame.en1949&caller=9game&m=kxV5msjNq6PPBXxz_cPqzg&t=1451175690&sid=1df9ab75-48c6-41a6-9b86-b0d98976378b&gid=628195&fz=7238956&pid=2&site=%E4%B9%9D%', 'SGSN IP': '202.89.204.5', 'InPayload': '100', 'Protocol': 'http', 'WebDomain': '3', 'Source IP': 'e5df:602a:5a83:eaf1:8049:23c4:0fb7:f78e', 'MCC': '515', 'LAC': '36202', 'FlushFlag': '0', 'APN': '.internet.globe.com.', 'DestinationPort': '80', 'SourcePort': '82', 'LineFormat': 'http7', 'IMSI': '515-02-040687823335'}

When i print json_dumps, output is:-

{"TransportProtocol": "udp", "MSISDN": "+62696174735", "ResponseCode":"E6%B%B8%E5%AE%89%E5%8D%93&pfid=139&ver=10.1.2.571title=Air%20fighter_pakage.apk", "GGSN IP": "202.89.193.185", "MSTimeZone": "+0008", "Numbers of time period": "1", "Mime Type": "audio/aac", "EndTime": "1462251588", "OutBound": "709", "Inbound": "35", "Method": "GET", "RAT": "ph", "Referer": "ghijk", "TAC": "35893783", "UserAgent": "961", "MNC": "02", "OutPayload": "0", "CI": "34301", "StartTime": "1462251588", "DestinationIP": "ef50:5fcd:498e:c265:a37b:10ec:7984:c6a3", "URL": "http:///group1/M00/6F/B2/poYBAFYtlqiALni4AG51LNrVFEQ342.apk?pn=com.airfly.fightergame.en1949&caller=9game&m=kxV5msjNq6PPBXxz_cPqzg&t=1451175690&sid=1df9ab75-48c6-41a6-9b86-b0d98976378b&gid=628195&fz=7238956&pid=2&site=%E4%B9%9D%", "SGSN IP": "202.89.204.5", "InPayload": "100", "Protocol": "http", "WebDomain": "3", "Source IP": "e5df:602a:5a83:eaf1:8049:23c4:0fb7:f78e", "MCC": "515", "LAC": "36202", "FlushFlag": "0", "APN": ".internet.globe.com.", "DestinationPort": "80", "SourcePort": "82", "LineFormat": "http7", "IMSI": "515-02-040687823335"}

Which, I guess is the json format which I further want to convert it to avro schema. But

SCHEMA = schema.parse(json_dumps)

throws an exception:-

Traceback (most recent call last):
File "convertToAvro.py", line 23, in <module>
json_to_avro()
File "convertToAvro.py", line 20, in json_to_avro
SCHEMA = schema.parse(json_dumps)
File "/usr/lib/python2.7/site-packages/avro/schema.py", line 785, in parse
return make_avsc_object(json_data, names)
File "/usr/lib/python2.7/site-packages/avro/schema.py", line 756, in make_avsc_object
raise SchemaParseException('No "type" property: %s' % json_data)
avro.schema.SchemaParseException: No "type" property: {u'TransportProtocol':u'udp', u'MSISDN': u'+62696174735', u'ResponseCode': u'E6%B8%B8%E5%AE%89%E5%8D%93&pfid=139&ver=10.1.2.571&title=Air%20fighter_pakage.apk', u'GGSN IP': u'202.89.193.185', u'EndTime': u'1462251588', u'Method': u'GET',u'Mime Type': u'audio/aac', u'OutBound': u'709', u'Inbound': u'35',u'Numbers of time period': u'1', u'RAT': u'import jsonph', u'Referer':u'ghijk', u'TAC': u'35893783', u'UserAgent': u'961', u'MNC':u'02',u'OutPayload': u'0', u'CI': u'34301', u'DestinationPort': u'80',u'DestinationIP': u'ef50:5fcd:498e:c265:a37b:10ec:7984:c6a3', u'URL':u'http:///group1/M00//B/poYBAFYtlqiALni4AG51LNrVFEQ342.apk?pn=com.airfly.fightergame.en1949&caller=9game&m=kxV5msjNq6PPBXxz_cPqzg&t=1451175690&sid=1df9ab75-48c6-41a6-9b86-b0d98976378b&gid=628195&fz=7238956&pid=2&site=%E4%B9%9D%', u'SGSN IP': u'202.89.204.5', u'InPayload': u'100', u'Protocol': u'http', u'WebDomain': u'3', u'Source IP': u'e5df:602a:5a83:eaf1:8049:23c4:0fb7:f78e', u'MCC': u'515', u'MSTimeZone': u'+0008', u'FlushFlag': u'0', u'APN': u'.internet.globe.com.', u'StartTime': u'1462251588', u'SourcePort': u'82', u'LineFormat': u'http7', u'LAC': u'36202', u'IMSI': u'515-02-040687823335'}

Just in case, here is my input record:-

Protocol,LineFormat,StartTime,EndTime,MSTimeZone,IMSI,MSISDN,TAC,MCC,MNC,LAC,CI,SGSNIP,GGSNIP,APN,RAT,WebDomain,SourceIP,DestinationIP,SourcePort,DestinationPort,TransportProtocol,FlushFlag,Numbers of time period,OutBound,Inbound,Method,URL,ResponseCode,UserAgent,MimeType,Referer,OutPayload,InPayload

http    http7   1462251588      1462251588      +0008       515-02-040687823335     +62696174735    35893783        515     02      36202   34301   202.89.204.5    202.89.193.185  .internet.globe.com.        ph  3               e5df:602a:5a83:eaf1:8049:23c4:0fb7:f78e ef50:5fcd:498e:c265:a37b:10ec:7984:c6a3 82      80      udp     0       1       709     35      GET     http:///group1/M00/6F/B2/poYBAFYtlqiALni4AG51LNrVFEQ342.apk?pn=com.airfly.fightergame.en1949&caller=9game&m=kxV5msjNq6PPBXxz_cPqzg&t=1451175690&sid=1df9ab75-48c6-41a6-9b86-b0d98976378b&gid=628195&fz=7238956&pid=2&site=%E4%B9%9D%        E6%B8%B8%E5%AE%89%E5%8D%93&pfid=139&ver=10.1.2.571&title=Air%20fighter_pakage.apk   961             audio/aac       ghijk   0       100

This happens because the parameter in schema.parse() function has to be avro-schema (not a record itself) like here ( https://avro.apache.org/docs/1.8.0/gettingstartedpython.html ):

schema = avro.schema.parse(open("user.avsc", "rb").read())

As you pass a json record, it breaks.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM