简体   繁体   English

python - asn1 将文本解析为 json

[英]python - asn1 parsed text to json

With text given in this link , need to extract data as follows使用此链接中给出的文本,需要提取数据如下

  1. Each record starts with YYYY Mmm dd hh:mm:ss.ms , for example 2019 Aug 31 09:17:36.550每条记录都以YYYY Mmm dd hh:mm:ss.ms ,例如 2019 Aug 31 09:17:36.550
  2. Each record has a header starting from line #1 above and ending with a blank line每条记录都有一个标题,从上面的第 1 行开始,以一个空行结束
  3. The record data is contained in lines below Interpreted PDU:记录数据包含在Interpreted PDU:下方的行中Interpreted PDU:
  4. The records of interest are the ones with record header first line having 0xB821 NR5G RRC OTA Packet -- RRC_RECONFIG感兴趣的记录是记录头第一行有0xB821 NR5G RRC OTA Packet -- RRC_RECONFIG

Is it possible to extract selected record headers and text below #3 above as an array of nested json in the format as below - snipped for brevity, really need to have the entire text data as JSON.是否可以将上面#3 下方的选定记录标题和文本提取为以下格式的嵌套 json 数组 - 为简洁起见,确实需要将整个文本数据作为 JSON。

data = [{"time": "2019 Aug 31  09:17:36.550", "PDU Number": "RRC_RECONFIG Message", "Physical Cell ID": 0, "rrc-TransactionIdentifier": 1, "criticalExtensions rrcReconfiguration": {"secondaryCellGroup": {"cellGroupId": 1, "rlc-BearerToAddModList": [{"logicalChannelIdentity": 1, "servedRadioBearer drb-Identity": 2, "rlc-Config am": {"ul-AM-RLC": {"sn-FieldLength": "size18", "t-PollRetransmit": "ms40", "pollPDU": "p32", "pollByte": "kB25", "maxRetxThreshold": "t32"}, "dl-AM-RLC": {"sn-FieldLength": "size18", "t-Reassembly": "ms40", "t-StatusProhibit": "ms20"}}}]}}  }, next records data here]

Note that the input text is parsed output of ASN1 data specifications in 3GPP 38.331 section 6.3.2.请注意,输入文本是 3GPP 38.331第 6.3.2 节中 ASN1 数据规范的解析输出。 I'm not sure normal python text parsing is the right way to handle this or should one use something like asn1tools library ?我不确定普通的 python 文本解析是处理这个问题的正确方法,还是应该使用像asn1tools库这样的东西? If so an example usage on this data would be helpful.如果是这样,对这些数据的示例用法会有所帮助。

Unfortunately, it is unlikely that somebody will come with a straight answer to your question (which is very similar to How to extract data from asn1 data file and load it into a dataframe? )不幸的是,不太可能有人会直接回答您的问题(这与如何从 asn1 数据文件中提取数据并将其加载到数据帧中非常相似

The text of your link is obviously a log file where ASN.1 value notation was used to make the messages human readable.链接的文本显然是一个日志文件,其中使用 ASN.1 值表示法使消息易于阅读。 So trying to decode these messages from their textual form is unusual and you will probably not find tooling for that.因此,尝试从文本形式解码这些消息是不寻常的,您可能找不到用于此的工具。

In theory, the generic method would be this one:理论上,通用方法是这样的:

  1. Gather the ASN.1 DEFINITIONS (schema) that were used to create the ASN.1 messages收集用于创建 ASN.1 消息的 ASN.1 定义(架构)
  2. Compile these DEFINITIONS with an ASN.1 tool (aka compiler) to generate an object model in your favorite language (python).使用 ASN.1 工具(又名编译器)编译这些定义,以您喜欢的语言 (python) 生成对象模型。 The tool would provide the specific code to encode and decode ... you would use ASN.1 values decoders.该工具将提供用于编码和解码的特定代码……您将使用 ASN.1 值解码器。
  3. Add your custom code (either to the object model or plugged in the ASN.1 compiler) to encode your JSON objects添加您的自定义代码(添加到对象模型或插入 ASN.1 编译器)以对您的 JSON 对象进行编码

As you see, it is a very long shot (I can expand if this explanation is too short or unclear)如您所见,这是一个很长的镜头(如果这个解释太短或不清楚,我可以扩展)

Unless your task is repetivite and/or the number of messages is big, try the methods you already know (manual search, regex) to search the log file.除非您的任务是重复的和/或消息数量很大,否则请尝试使用您已经知道的方法(手动搜索、正则表达式)来搜索日志文件。

If you want to see what it takes to create ASN.1 tools, you can find a few (not that many as ASN.1 is not particularly young and popular).如果你想看看创建 ASN.1 工具需要什么,你可以找到一些(没有那么多,因为 ASN.1 不是特别年轻和流行)。 Check out https://github.com/etingof/pyasn1 (python)查看https://github.com/etingof/pyasn1 (python)

I created my own for fun in Java and I am adding the ASN.1 value decoders to illustrate my answer: https://github.com/yafred/asn1-tool (branch text-asn-value-support)我在 Java 中创建了自己的乐趣,我正在添加 ASN.1 值解码器来说明我的答案: https : //github.com/yafred/asn1-tool (branch text-asn-value-support)

Given that you have a textual representation of the input data, you might take a look at the parse library.鉴于您有输入数据的文本表示,您可以查看解析库。 This allows you to find a pattern in a string and assign contents to variables.这允许您在字符串中查找模式并将内容分配给变量。

Here is an example for extracting the time, PDU Number and Physical Cell ID data fields:以下是提取时间、PDU 编号和物理小区 ID 数据字段的示例:

import parse

with open('w9s2MJK4.txt', 'r') as f:
  input = f.read()

data = []
pattern = parse.compile('\n{year:d} {month:w} {day:d}  {hour:d}:{min:d}:{sec:d}.{ms:d}{}Physical Cell ID = {pcid:d}{}PDU Number = {pdu:w} {pdutype:w}')

for s in pattern.findall(input):
  record = {}
  record['time'] = '{} {} {} {:02d}:{:02d}:{:02d}.{:03d}'.format(s.named['year'], s.named['month'], s.named['day'], s.named['hour'], s.named['min'], s.named['sec'], s.named['ms'])
  record['PDU Number'] = '{} {}'.format(s.named['pdu'], s.named['pdutype'])
  record['Physical Cell ID'] = s.named['pcid']
  data.append(record)

Since you have quite a complicated structure and a large number of data fields, this might become a bit cumbersome, but personally I would prefer this approach over regular expressions.由于您有一个相当复杂的结构和大量的数据字段,这可能会变得有点麻烦,但我个人更喜欢这种方法而不是正则表达式。 Maybe there is also a smarter method to parse the date (which unfortunately seems not to have one of the standard formats supported by the library).也许还有一种更智能的方法来解析日期(不幸的是,它似乎没有库支持的标准格式之一)。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM