繁体   English   中英

将json解析为avro模式时出现Python异常:avro.schema.SchemaParseException:没有“ type”属性

[英]Python exception while parsing json to avro schema: avro.schema.SchemaParseException: No “type” property

我从文件中读取一条记录,然后将其转换为字典。 后来,我将该字典转换为json格式,以便进一步尝试将其转换为avro模式。

到目前为止,这是我的代码段:

import json
from avro import schema, datafile, io

def json_to_avro():
        fo = open("avro_record.txt", "r")
        data = fo.readlines()
        final_header = []
        final_rec = []
        for header in data[0:1]:
            header = header.strip("\n")
            header = header.split(",")
            final_header = header
        for rec in data[1:]:
            rec = rec.strip("\n")
            rec = rec.split(" ")
            rec = ' '.join(rec).split()
            final_rec = rec
        final_dict = dict(zip(final_header,final_rec))
        #print final_dict
        json_dumps = json.dumps(final_dict, ensure_ascii=False)
        #print json_dumps
        SCHEMA = schema.parse(json_dumps)

json_to_avro()

当我打印final_dict时,输出为:-

{'TransportProtocol': 'udp', 'MSISDN': '+62696174735', 'ResponseCode': 'E6%B8    %B8%E%A%8%E%8%93&pfid=139ver=10.1.2.571title=Air%20fighter_pakage.apk', 'GGSN IP': '202.89.193.185', 'MSTimeZone': '+0008', 'Numbers of time period': '1', 'Mime Type': 'audio/aac', 'EndTime': '1462251588', 'OutBound': '709', 'Inbound': '35','Method': 'GET', 'RAT': 'ph', 'Referer': 'ghijk', 'TAC': '35893783', 'UserAgent': '961', 'MNC': '02', 'OutPayload': '0', 'CI': '34301', 'StartTime': '1462251588', 'DestinationIP':'ef50:5fcd:498e:c265:a37b:10ec:7984:c6a3', 'URL': 'http:///group1/M00/6F/B2/poYBAFYtlqiALni4AG51LNrVFEQ342.apk?pn=com.airfly.fightergame.en1949&caller=9game&m=kxV5msjNq6PPBXxz_cPqzg&t=1451175690&sid=1df9ab75-48c6-41a6-9b86-b0d98976378b&gid=628195&fz=7238956&pid=2&site=%E4%B9%9D%', 'SGSN IP': '202.89.204.5', 'InPayload': '100', 'Protocol': 'http', 'WebDomain': '3', 'Source IP': 'e5df:602a:5a83:eaf1:8049:23c4:0fb7:f78e', 'MCC': '515', 'LAC': '36202', 'FlushFlag': '0', 'APN': '.internet.globe.com.', 'DestinationPort': '80', 'SourcePort': '82', 'LineFormat': 'http7', 'IMSI': '515-02-040687823335'}

当我打印json_dumps时,输出为:-

{"TransportProtocol": "udp", "MSISDN": "+62696174735", "ResponseCode":"E6%B%B8%E5%AE%89%E5%8D%93&pfid=139&ver=10.1.2.571title=Air%20fighter_pakage.apk", "GGSN IP": "202.89.193.185", "MSTimeZone": "+0008", "Numbers of time period": "1", "Mime Type": "audio/aac", "EndTime": "1462251588", "OutBound": "709", "Inbound": "35", "Method": "GET", "RAT": "ph", "Referer": "ghijk", "TAC": "35893783", "UserAgent": "961", "MNC": "02", "OutPayload": "0", "CI": "34301", "StartTime": "1462251588", "DestinationIP": "ef50:5fcd:498e:c265:a37b:10ec:7984:c6a3", "URL": "http:///group1/M00/6F/B2/poYBAFYtlqiALni4AG51LNrVFEQ342.apk?pn=com.airfly.fightergame.en1949&caller=9game&m=kxV5msjNq6PPBXxz_cPqzg&t=1451175690&sid=1df9ab75-48c6-41a6-9b86-b0d98976378b&gid=628195&fz=7238956&pid=2&site=%E4%B9%9D%", "SGSN IP": "202.89.204.5", "InPayload": "100", "Protocol": "http", "WebDomain": "3", "Source IP": "e5df:602a:5a83:eaf1:8049:23c4:0fb7:f78e", "MCC": "515", "LAC": "36202", "FlushFlag": "0", "APN": ".internet.globe.com.", "DestinationPort": "80", "SourcePort": "82", "LineFormat": "http7", "IMSI": "515-02-040687823335"}

我猜这是json格式,我还想将其转换为avro模式。

SCHEMA = schema.parse(json_dumps)

引发异常:

Traceback (most recent call last):
File "convertToAvro.py", line 23, in <module>
json_to_avro()
File "convertToAvro.py", line 20, in json_to_avro
SCHEMA = schema.parse(json_dumps)
File "/usr/lib/python2.7/site-packages/avro/schema.py", line 785, in parse
return make_avsc_object(json_data, names)
File "/usr/lib/python2.7/site-packages/avro/schema.py", line 756, in make_avsc_object
raise SchemaParseException('No "type" property: %s' % json_data)
avro.schema.SchemaParseException: No "type" property: {u'TransportProtocol':u'udp', u'MSISDN': u'+62696174735', u'ResponseCode': u'E6%B8%B8%E5%AE%89%E5%8D%93&pfid=139&ver=10.1.2.571&title=Air%20fighter_pakage.apk', u'GGSN IP': u'202.89.193.185', u'EndTime': u'1462251588', u'Method': u'GET',u'Mime Type': u'audio/aac', u'OutBound': u'709', u'Inbound': u'35',u'Numbers of time period': u'1', u'RAT': u'import jsonph', u'Referer':u'ghijk', u'TAC': u'35893783', u'UserAgent': u'961', u'MNC':u'02',u'OutPayload': u'0', u'CI': u'34301', u'DestinationPort': u'80',u'DestinationIP': u'ef50:5fcd:498e:c265:a37b:10ec:7984:c6a3', u'URL':u'http:///group1/M00//B/poYBAFYtlqiALni4AG51LNrVFEQ342.apk?pn=com.airfly.fightergame.en1949&caller=9game&m=kxV5msjNq6PPBXxz_cPqzg&t=1451175690&sid=1df9ab75-48c6-41a6-9b86-b0d98976378b&gid=628195&fz=7238956&pid=2&site=%E4%B9%9D%', u'SGSN IP': u'202.89.204.5', u'InPayload': u'100', u'Protocol': u'http', u'WebDomain': u'3', u'Source IP': u'e5df:602a:5a83:eaf1:8049:23c4:0fb7:f78e', u'MCC': u'515', u'MSTimeZone': u'+0008', u'FlushFlag': u'0', u'APN': u'.internet.globe.com.', u'StartTime': u'1462251588', u'SourcePort': u'82', u'LineFormat': u'http7', u'LAC': u'36202', u'IMSI': u'515-02-040687823335'}

以防万一,这是我的输入记录:-

Protocol,LineFormat,StartTime,EndTime,MSTimeZone,IMSI,MSISDN,TAC,MCC,MNC,LAC,CI,SGSNIP,GGSNIP,APN,RAT,WebDomain,SourceIP,DestinationIP,SourcePort,DestinationPort,TransportProtocol,FlushFlag,Numbers of time period,OutBound,Inbound,Method,URL,ResponseCode,UserAgent,MimeType,Referer,OutPayload,InPayload

http    http7   1462251588      1462251588      +0008       515-02-040687823335     +62696174735    35893783        515     02      36202   34301   202.89.204.5    202.89.193.185  .internet.globe.com.        ph  3               e5df:602a:5a83:eaf1:8049:23c4:0fb7:f78e ef50:5fcd:498e:c265:a37b:10ec:7984:c6a3 82      80      udp     0       1       709     35      GET     http:///group1/M00/6F/B2/poYBAFYtlqiALni4AG51LNrVFEQ342.apk?pn=com.airfly.fightergame.en1949&caller=9game&m=kxV5msjNq6PPBXxz_cPqzg&t=1451175690&sid=1df9ab75-48c6-41a6-9b86-b0d98976378b&gid=628195&fz=7238956&pid=2&site=%E4%B9%9D%        E6%B8%B8%E5%AE%89%E5%8D%93&pfid=139&ver=10.1.2.571&title=Air%20fighter_pakage.apk   961             audio/aac       ghijk   0       100

发生这种情况是因为schema.parse()函数中的参数必须像下面这样( https://avro.apache.org/docs/1.8.0/gettingstartedpython.html )是avro模式(不是记录本身):

schema = avro.schema.parse(open("user.avsc", "rb").read())

当您传递json记录时,它会中断。

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM