Python: hexadecimal regular expression question

Question

I want to parse the output of a serial monitoring program called Docklight (I highly recommend it) It outputs 'hexadecimal' strings: or a sequence of (two capital hex digits followed by a space). the corresponding regular expression is: ([0-9A-F]{2} )+ for example: '05 03 DA 4B 3F '

When program detects particular sequences of characters it places comments in the 'hexadecimal ' string. for example:

'05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'

comments are strings of the following format ' .+ ' (a sequence of characters preceded by a space and followed by a space)

I want to get rid of the comments. for example, the 'hexadecimal' string above filtered would be:

'05 03 04 01 0A 03 08 0B BD AF 0D 0A '

how do i go about doing this with A regular expression?

Answer 1

You could try re.findall() :

>>> a='05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'
>>> re.findall(r"\b[0-9A-F]{2}\b", a)
['05', '03', '04', '01', '0A', '03', '08', '0B', 'BD', 'AF', '0D', '0A']

The \\b in the regular expression matches a "word boundary".

Of course, your input is ambiguous if the serial monitor inserts something like THIS BE THE HEADER .

Answer 2

It might be easier to find all the hexadecimal numbers, assuming the inserted strings won't contain a match:

>>> data = '05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'
>>> import re
>>> pattern = re.compile("[0-9A-F]{2} ")
>>> "".join(pattern.findall(data))
'05 03 04 01 0A 03 08 0B BD AF AD 0D 0A '

Otherwise you could use the fact that the inserted strings are preceed by two spaces:

>>> data = '05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'
>>> re.sub("(  .*?)(?=( [0-9A-F]{2} |$))","",data)
'05 03 04 01 0A 03 08 0B BD AF 0D 0A'

This uses a look ahead to work out when the inserted string ends. It looks for either a hexadecimal string surround by spaces or the end of the source string.

Answer 3

Using your regex

hexa = '([0-9A-F]{2} )+'
" ".join(re.findall(hexa, line))

Answer 4

While you already received two answers that find you all hexadecimal numbers, here's the same with a direct regular expression that finds you all text that does not look like a hexadecimal number (assuming that's two letter/digits in uppercase / lowercase 0-9 and AF range, followed by a space).

Something like this (sorry, I'm not a pythoneer, but you get the idea):

newstring = re.sub(r"[^ ]+(?<![0-9A-Fa-f ]{2}|^.)", "", yourstring)

It works by "looking back". It finds every consecutive non-space substring, then negatively looks back with (?<!....) . It says: "if the previous two characters were not a hex number, then succeed". The little ^. at the end prevents to incorrectly match the first character of the string.

Edit

As suggested by Alan Moore, here's the same idea with a positive lookahead expression:

newstring = re.sub(r"(?>\b[0-9A-Fa-f ]{2}\b)", "", yourstring)

Answer 5

Why regexp? More pythonic for me is (fixed for hexdigit not regular digit):

command='05 03 04 01 0A  The Header 03 08 0B BD AF  The PAYLOAD 0D 0A  The Footer'
print ' '.join(com for com in command.split()
               if len(com)==2 and all(c.upper() in '0123456789ABCDEF' for c in com))

Answer 6

How about a solution that actually uses regex negation? ;)

result = re.sub(r"[ ]+(?:(?!\b[0-9A-F]{2}\b).)+", "", subject)

Python: hexadecimal regular expression question

Question

6 answers

solution1
4 2010-08-03 10:36:17

solution2
0 2010-08-03 10:37:55

solution3
0 2010-08-03 10:46:25

solution4
0 2010-08-03 10:47:35

Edit

solution5
0 2010-08-03 10:53:38

solution6
0 2010-08-03 10:53:58

Python: hexadecimal regular expression question

Question

6 answers

solution1 4 2010-08-03 10:36:17

solution2 0 2010-08-03 10:37:55

solution3 0 2010-08-03 10:46:25

solution4 0 2010-08-03 10:47:35

Edit

solution5 0 2010-08-03 10:53:38

solution6 0 2010-08-03 10:53:58

solution1
4 2010-08-03 10:36:17

solution2
0 2010-08-03 10:37:55

solution3
0 2010-08-03 10:46:25

solution4
0 2010-08-03 10:47:35

solution5
0 2010-08-03 10:53:38

solution6
0 2010-08-03 10:53:58