简体   繁体   中英

Python: hexadecimal regular expression question

I want to parse the output of a serial monitoring program called Docklight (I highly recommend it) It outputs 'hexadecimal' strings: or a sequence of (two capital hex digits followed by a space). the corresponding regular expression is: ([0-9A-F]{2} )+ for example: '05 03 DA 4B 3F '

When program detects particular sequences of characters it places comments in the 'hexadecimal ' string. for example:

'05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'

comments are strings of the following format ' .+ ' (a sequence of characters preceded by a space and followed by a space)

I want to get rid of the comments. for example, the 'hexadecimal' string above filtered would be:

'05 03 04 01 0A 03 08 0B BD AF 0D 0A '

how do i go about doing this with A regular expression?

You could try re.findall() :

>>> a='05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'
>>> re.findall(r"\b[0-9A-F]{2}\b", a)
['05', '03', '04', '01', '0A', '03', '08', '0B', 'BD', 'AF', '0D', '0A']

The \\b in the regular expression matches a "word boundary".

Of course, your input is ambiguous if the serial monitor inserts something like THIS BE THE HEADER .

It might be easier to find all the hexadecimal numbers, assuming the inserted strings won't contain a match:

>>> data = '05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'
>>> import re
>>> pattern = re.compile("[0-9A-F]{2} ")
>>> "".join(pattern.findall(data))
'05 03 04 01 0A 03 08 0B BD AF AD 0D 0A '

Otherwise you could use the fact that the inserted strings are preceed by two spaces:

>>> data = '05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'
>>> re.sub("(  .*?)(?=( [0-9A-F]{2} |$))","",data)
'05 03 04 01 0A 03 08 0B BD AF 0D 0A'

This uses a look ahead to work out when the inserted string ends. It looks for either a hexadecimal string surround by spaces or the end of the source string.

Using your regex

hexa = '([0-9A-F]{2} )+'
" ".join(re.findall(hexa, line))

While you already received two answers that find you all hexadecimal numbers, here's the same with a direct regular expression that finds you all text that does not look like a hexadecimal number (assuming that's two letter/digits in uppercase / lowercase 0-9 and AF range, followed by a space).

Something like this (sorry, I'm not a pythoneer, but you get the idea):

newstring = re.sub(r"[^ ]+(?<![0-9A-Fa-f ]{2}|^.)", "", yourstring)

It works by "looking back". It finds every consecutive non-space substring, then negatively looks back with (?<!....) . It says: "if the previous two characters were not a hex number, then succeed". The little ^. at the end prevents to incorrectly match the first character of the string.

Edit

As suggested by Alan Moore, here's the same idea with a positive lookahead expression:

newstring = re.sub(r"(?>\b[0-9A-Fa-f ]{2}\b)", "", yourstring)

Why regexp? More pythonic for me is (fixed for hexdigit not regular digit):

command='05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'
print ' '.join(com for com in command.split()
               if len(com)==2 and all(c.upper() in '0123456789ABCDEF' for c in com))

How about a solution that actually uses regex negation? ;)

result = re.sub(r"[ ]+(?:(?!\b[0-9A-F]{2}\b).)+", "", subject)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM