繁体   English   中英

Python中的处理/提取

[英]Process/extraction in Python

我有一个文本文件,其中包含如下所示的数据:

ACK
DATA1   < >
ACK
DATA1   < >
NAK
ACK
DATA1   < >
DATA0   < 20 >
ACK
DATA1   < 01 01 01 00 >
ACK
ACK
DATA1   < >
DATA1   < 20 >
ACK
DATA1   < >
ACK
ACK
ACK
ACK
ACK
ACK
ACK

ACK
DATA0   < 00 00 00 00 ff ff ff ff 00 00 00 01 ff ff ff fe 00 00 00 02 ff ff ff fd 00 00 00 03 ff ff ff fc
      00 00 00 08 ff ff ff f7 00 00 00 09 ff ff ff f6 00 00 00 0a ff ff ff f5 00 00 00 0b ff ff ff f4
      00 00 00 10 ff ff ff ef 00 00 00 11 ff ff ff ee 00 00 00 12 ff ff ff ed 00 00 00 13 ff ff ff ec
      00 00 00 18 ff ff ff e7 00 00 00 19 ff ff ff e6 00 00 00 1a ff ff ff e5 00 00 00 1b ff ff ff e4
      00 00 00 20 ff ff ff df 00 00 00 21 ff ff ff de 00 00 00 22 ff ff ff dd 00 00 00 23 ff ff ff dc
      00 00 00 28 ff ff ff d7 00 00 00 29 ff ff ff d6 00 00 00 2a ff ff ff d5 00 00 00 2b ff ff ff d4
      00 00 00 30 ff ff ff cf 00 00 00 31 ff ff ff ce 00 00 00 32 ff ff ff cd 00 00 00 33 ff ff ff cc
      00 00 00 38 ff ff ff c7 00 00 00 39 ff ff ff c6 00 00 00 3a ff ff ff c5 00 00 00 3b ff ff ff c4
      00 00 00 40 ff ff ff bf 00 00 00 41 ff ff ff be 00 00 00 42 ff ff ff bd 00 00 00 43 ff ff ff bc
      00 00 00 48 ff ff ff b7 00 00 00 49 ff ff ff b6 00 00 00 4a ff ff ff b5 00 00 00 4b ff ff ff b4
      00 00 00 50 ff ff ff af 00 00 00 51 ff ff ff ae 00 00 00 52 ff ff ff ad 00 00 00 53 ff ff ff ac
      00 00 00 58 ff ff ff a7 00 00 00 59 ff ff ff a6 00 00 00 5a ff ff ff a5 00 00 00 5b ff ff ff a4
      00 00 00 60 ff ff ff 9f 00 00 00 61 ff ff ff 9e 00 00 00 62 ff ff ff 9d 00 00 00 63 ff ff ff 9c
      00 00 00 68 ff ff ff 97 00 00 00 69 ff ff ff 96 00 00 00 6a ff ff ff 95 00 00 00 6b ff ff ff 94
      00 00 00 70 ff ff ff 8f 00 00 00 71 ff ff ff 8e 00 00 00 72 ff ff ff 8d 00 00 00 73 ff ff ff 8c
      00 00 00 78 ff ff ff 87 00 00 00 79 ff ff ff 86 00 00 00 7a ff ff ff 85 00 00 00 7b ff ff ff 84 >
DATA1   < 01 01 01 01 fe fe fe fe 00 00 01 00 ff ff fe ff 00 00 02 00 ff ff fd ff 00 00 03 00 ff ff fc ff
      00 00 08 00 ff ff f7 ff 00 00 09 00 ff ff f6 ff 00 00 0a 00 ff ff f5 ff 00 00 0b 00 ff ff f4 ff
      00 00 10 00 ff ff ef ff 00 00 11 00 ff ff ee ff 00 00 12 00 ff ff ed ff 00 00 13 00 ff ff ec ff
      00 00 18 00 ff ff e7 ff 00 00 19 00 ff ff e6 ff 00 00 1a 00 ff ff e5 ff 00 00 1b 00 ff ff e4 ff
      00 00 20 00 ff ff df ff 00 00 21 00 ff ff de ff 00 00 22 00 ff ff dd ff 00 00 23 00 ff ff dc ff
      00 00 28 00 ff ff d7 ff 00 00 29 00 ff ff d6 ff 00 00 2a 00 ff ff d5 ff 00 00 2b 00 ff ff d4 ff
      00 00 30 00 ff ff cf ff 00 00 31 00 ff ff ce ff 00 00 32 00 ff ff cd ff 00 00 33 00 ff ff cc ff
      00 00 38 00 ff ff c7 ff 00 00 39 00 ff ff c6 ff 00 00 3a 00 ff ff c5 ff 00 00 3b 00 ff ff c4 ff
      00 00 40 00 ff ff bf ff 00 00 41 00 ff ff be ff 00 00 42 00 ff ff bd ff 00 00 43 00 ff ff bc ff
      00 00 48 00 ff ff b7 ff 00 00 49 00 ff ff b6 ff 00 00 4a 00 ff ff b5 ff 00 00 4b 00 ff ff b4 ff
      00 00 50 00 ff ff af ff 00 00 51 00 ff ff ae ff 00 00 52 00 ff ff ad ff 00 00 53 00 ff ff ac ff
      00 00 58 00 ff ff a7 ff 00 00 59 00 ff ff a6 ff 00 00 5a 00 ff ff a5 ff 00 00 5b 00 ff ff a4 ff
      00 00 60 00 ff ff 9f ff 00 00 61 00 ff ff 9e ff 00 00 62 00 ff ff 9d ff 00 00 63 00 ff ff 9c ff
      00 00 68 00 ff ff 97 ff 00 00 69 00 ff ff 96 ff 00 00 6a 00 ff ff 95 ff 00 00 6b 00 ff ff 94 ff
      00 00 70 00 ff ff 8f ff 00 00 71 00 ff ff 8e ff 00 00 72 00 ff ff 8d ff 00 00 73 00 ff ff 8c ff
      00 00 78 00 ff ff 87 ff 00 00 79 00 ff ff 86 ff 00 00 7a 00 ff ff 85 ff 00 00 7b 00 ff ff 84 ff >

该数据是部分USB流量日志,将用作黄金标准,以比较C程序在运行中生成的数据。不幸的是,黄金标准发生了变化,我希望能够灵活地从数据库中生成新结构流量日志。

换句话说,我想使用Python生成将在C程序中使用的结构。 我需要将此数据转换为包含令牌的结构,该令牌转换为等效的十六进制值( ACK = 0xD2DATA1 = 0x4B等)和数据( <01 01 01> )。

我最努力的部分是数据在多行中,例如:

DATA0 < 00 00 00 00...ff ff ff fc 
        00 00 00 00...ff ff ff f4
        ....
        00 00 00 00...ff ff ff 84 > 

我还没有找到一种方法来连接这些行并将它们放在自己的行中,如下所示:

DATA0 < 00 00 00 00...ff ff ff 84 >

一旦数据在一行中,我就知道可以使用split()方法提取感兴趣的部分。

可能有一种轻松的方法,但是如果您的数据位于“ data.txt”中,则可以使用以下方法:

with open('data.txt', 'rt') as fobj:
    lines = []
    in_data_line = False
    for line in fobj:
        line = line.rstrip('\n')
        lines.append(line)
        if not in_data_line and line.startswith('DATA') and not line.endswith('>'):
            in_data_line = True
        if in_data_line and line.endswith('>'):
            in_data_line = False
        if not in_data_line:
            lines.append('\n')
# lines now has DATA lines joined
print(''.join(lines))

我是你,这就是我要做的。 放置这些多行数据后,将行开头的双标签替换为空格。 然后合并(或加入)所有这些。

只是一个骨架。 它不连接行,而是将整个文本拆分为单词,然后在尖括号之间重建数据列表。 我希望所得到的数据将易于处理。

def lex(file):
    in_data = False
    with open(file) as infile:
        for line in infile:
            for word in line.split():
                if not in_data:
                    if word == '<':
                        data_list = []
                        in_data = True
                    else:
                        # process ACK, NAK, DATA, ....
                        yield word
                else:
                    if word == '>':
                        in_data = False
                        yield data_list
                    else:
                        data_list.append(int(word, 16))

print(list(lex('data.txt')))

输出(缩短):

['ACK','DATA1',[],'ACK','DATA1',[],'NAK','ACK','DATA1',[],'DATA0',[32],'ACK', 'DATA1',[1,1,1,0],'ACK','ACK','DATA1',[],'DATA1',[32],'ACK','DATA1',[],'ACK ','ACK','ACK','ACK','ACK','ACK','ACK','ACK','DATA0',[0、0、0、0、255、255、255、255 ,0,0,0,1,255,255,255,254,0,0,0,2,255,255,255,253,0,0,0,3,255,255,255,252,0 ,0、0、8、255、255、255、247、0、0、0、9、255、255、255、246、0、0、0、10、255、255、255、245、0、0 ,0,11,255,255,255,244,0,0,0,16,255,..... 255]]

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM