[英]python - asn1 parsed text to json
With text given in this link , need to extract data as follows使用此链接中给出的文本,需要提取数据如下
YYYY Mmm dd hh:mm:ss.ms
, for example 2019 Aug 31 09:17:36.550YYYY Mmm dd hh:mm:ss.ms
,例如 2019 Aug 31 09:17:36.550Interpreted PDU:
Interpreted PDU:
下方的行中Interpreted PDU:
0xB821 NR5G RRC OTA Packet -- RRC_RECONFIG
0xB821 NR5G RRC OTA Packet -- RRC_RECONFIG
Is it possible to extract selected record headers and text below #3 above as an array of nested json in the format as below - snipped for brevity, really need to have the entire text data as JSON.是否可以将上面#3 下方的选定记录标题和文本提取为以下格式的嵌套 json 数组 - 为简洁起见,确实需要将整个文本数据作为 JSON。
data = [{"time": "2019 Aug 31 09:17:36.550", "PDU Number": "RRC_RECONFIG Message", "Physical Cell ID": 0, "rrc-TransactionIdentifier": 1, "criticalExtensions rrcReconfiguration": {"secondaryCellGroup": {"cellGroupId": 1, "rlc-BearerToAddModList": [{"logicalChannelIdentity": 1, "servedRadioBearer drb-Identity": 2, "rlc-Config am": {"ul-AM-RLC": {"sn-FieldLength": "size18", "t-PollRetransmit": "ms40", "pollPDU": "p32", "pollByte": "kB25", "maxRetxThreshold": "t32"}, "dl-AM-RLC": {"sn-FieldLength": "size18", "t-Reassembly": "ms40", "t-StatusProhibit": "ms20"}}}]}} }, next records data here]
Note that the input text is parsed output of ASN1 data specifications in 3GPP 38.331 section 6.3.2.请注意,输入文本是 3GPP 38.331第 6.3.2 节中 ASN1 数据规范的解析输出。 I'm not sure normal python text parsing is the right way to handle this or should one use something like asn1tools library ?
我不确定普通的 python 文本解析是处理这个问题的正确方法,还是应该使用像asn1tools库这样的东西? If so an example usage on this data would be helpful.
如果是这样,对这些数据的示例用法会有所帮助。
Unfortunately, it is unlikely that somebody will come with a straight answer to your question (which is very similar to How to extract data from asn1 data file and load it into a dataframe? )不幸的是,不太可能有人会直接回答您的问题(这与如何从 asn1 数据文件中提取数据并将其加载到数据帧中非常相似? )
The text of your link is obviously a log file where ASN.1 value notation was used to make the messages human readable.链接的文本显然是一个日志文件,其中使用 ASN.1 值表示法使消息易于阅读。 So trying to decode these messages from their textual form is unusual and you will probably not find tooling for that.
因此,尝试从文本形式解码这些消息是不寻常的,您可能找不到用于此的工具。
In theory, the generic method would be this one:理论上,通用方法是这样的:
As you see, it is a very long shot (I can expand if this explanation is too short or unclear)如您所见,这是一个很长的镜头(如果这个解释太短或不清楚,我可以扩展)
Unless your task is repetivite and/or the number of messages is big, try the methods you already know (manual search, regex) to search the log file.除非您的任务是重复的和/或消息数量很大,否则请尝试使用您已经知道的方法(手动搜索、正则表达式)来搜索日志文件。
If you want to see what it takes to create ASN.1 tools, you can find a few (not that many as ASN.1 is not particularly young and popular).如果你想看看创建 ASN.1 工具需要什么,你可以找到一些(没有那么多,因为 ASN.1 不是特别年轻和流行)。 Check out https://github.com/etingof/pyasn1 (python)
查看https://github.com/etingof/pyasn1 (python)
I created my own for fun in Java and I am adding the ASN.1 value decoders to illustrate my answer: https://github.com/yafred/asn1-tool (branch text-asn-value-support)我在 Java 中创建了自己的乐趣,我正在添加 ASN.1 值解码器来说明我的答案: https : //github.com/yafred/asn1-tool (branch text-asn-value-support)
Given that you have a textual representation of the input data, you might take a look at the parse library.鉴于您有输入数据的文本表示,您可以查看解析库。 This allows you to find a pattern in a string and assign contents to variables.
这允许您在字符串中查找模式并将内容分配给变量。
Here is an example for extracting the time, PDU Number and Physical Cell ID data fields:以下是提取时间、PDU 编号和物理小区 ID 数据字段的示例:
import parse
with open('w9s2MJK4.txt', 'r') as f:
input = f.read()
data = []
pattern = parse.compile('\n{year:d} {month:w} {day:d} {hour:d}:{min:d}:{sec:d}.{ms:d}{}Physical Cell ID = {pcid:d}{}PDU Number = {pdu:w} {pdutype:w}')
for s in pattern.findall(input):
record = {}
record['time'] = '{} {} {} {:02d}:{:02d}:{:02d}.{:03d}'.format(s.named['year'], s.named['month'], s.named['day'], s.named['hour'], s.named['min'], s.named['sec'], s.named['ms'])
record['PDU Number'] = '{} {}'.format(s.named['pdu'], s.named['pdutype'])
record['Physical Cell ID'] = s.named['pcid']
data.append(record)
Since you have quite a complicated structure and a large number of data fields, this might become a bit cumbersome, but personally I would prefer this approach over regular expressions.由于您有一个相当复杂的结构和大量的数据字段,这可能会变得有点麻烦,但我个人更喜欢这种方法而不是正则表达式。 Maybe there is also a smarter method to parse the date (which unfortunately seems not to have one of the standard formats supported by the library).
也许还有一种更智能的方法来解析日期(不幸的是,它似乎没有库支持的标准格式之一)。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.