简体   繁体   English

有没有一种简单的方法可以用Python3替换具有特定十六进制值的XML之类的标签?

[英]Is there an easy way to substitute XML like tags with specific hex values with Python3?

I have a set of data that is structured like XML but the data is NOT ascii but instead hex. 我有一组结构像XML的数据,但是数据不是ascii而是十六进制。

For instance, the data could be 例如,数据可能是

EX. A
<body>
    <entry1> 0x12 </entry1>
    <entry2> 0x01 </entry2>
</body>

and that could translate to 那可以转化为

EX. B
<0x01>
    <0x02> 0x12 <0xff>
    <0x03> 0x01 <0xff>
<0xff>

In the example above (EX B), <0x02> 0x12 <0xff> indicates that entry1 has a value of 0x12. 在上面的示例中(EX B), <0x02> 0x12 <0xff>表示条目1的值为0x12。

I am not a native python programmer, so I may be going about doing this the long way (would love it if it were easier) but what I am trying to do is go from the human readable structure (EX. A) to the hex version (EX B). 我不是本地Python程序员,所以我可能会做很长的路要走(如果比较容易,我会喜欢的),但是我想做的是从人类可读的结构(例如EX.A)过渡到十六进制版本(EX B)。

My idea is to write the XML like to file using lxml and populating the neccessary revelent area, and read the file with Python and perform substitutions using string manipulation for the tags based off a code book/ dictionary. 我的想法是使用lxml编写类似于文件的XML,并填充必要的相关区域,然后使用Python读取文件,并使用基于代码簿/字典的标签的字符串操作来执行替换。

In the end, I am looking for a byte array that would look like 最后,我正在寻找一个字节数组,看起来像

0x01 0x02 0x12 0xff 0x03 0x01 0xff 0xff

My question is : Is there an easier way ? 我的问题是:有没有更简单的方法?

A custom html.HTMLParser ( doc ) might suit your needs: 自定义html.HTMLParserdoc )可能适合您的需求:

from html.parser import HTMLParser

class MyHTMLParser(HTMLParser):
    def __init__(self):
        super().__init__()
        self.__tags = {}
        self.__counter = 1

        self.__result = []

    def handle_starttag(self, tag, attrs):
        if not tag in self.__tags:
            self.__tags[tag] = '0x{:02x}'.format(self.__counter)
            self.__counter += 1
        self.__result.append(self.__tags[tag])

    def handle_endtag(self, tag):
        self.__result.append('0xff')

    def handle_data(self, data):
        self.__result.append(data.strip())

    @property
    def result(self):
        return [v for v in self.__result if v]

parser = MyHTMLParser()
parser.feed('''<body>
    <entry1> 0x12 </entry1>
    <entry2> 0x01 </entry2>
</body>''')

print(' '.join(parser.result))

Prints: 打印:

0x01 0x02 0x12 0xff 0x03 0x01 0xff 0xff

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM