简体   繁体   中英

Python Regex match a mac address from the end?

I have the following re to extract MAC address:

re.sub( r'(\S{2,2})(?!$)\s*', r'\1:', '0x0000000000aa bb ccdd ee ff' )

However, this gave me 0x:00:00:00:00:00:aa:bb:cc:dd:ee:ff .

How do I modify this regex to stop after matching the first 6 pairs starting from the end, so that I get aa:bb:cc:dd:ee:ff ?

Note: the string has whitespace in between which is to be ignored. Only the last 12 characters are needed.

Edit1: re.findall( r'(\\S{2})\\s*(\\S{2})\\s*(\\S{2})\\s*(\\S{2})\\s*(\\S{2})\\s*(\\S{2})\\s*$',a) finds the last 6 pairs in the string. I still don't know how to compress this regex. Again this still depends on the fact that the strings are in pairs.

Ideally the regex should take the last 12 valid \\S characters starting from the end and string them with :

Edit2: Inspired by @Mariano answer which works great but depends on the fact that that last 12 characters must start with a pair I came up with the following solution. It is kludgy but still seems to work for all inputs.

string = '0x0000000000a abb ccddeeff'
':'.join( ''.join( i ) for i in re.findall( '(\S)\s*(\S)(?!(?:\s*\S\s*{11})',' string) )
'aa:bb:cc:dd:ee:ff'

Edit3: @Mariano has updated his answer which now works for all inputs

This will work for the last 12 characters, ignoring whitespace.

Code:

import re

text = "0x0000000000aa bb ccdd ee ff"

result = re.sub( r'.*?(?!(?:\s*\S){13})(\S)\s*(\S)', r':\1\2', text)[1:]

print(result)

Output:

aa:bb:cc:dd:ee:ff

DEMO


Regex breakdown:

The expression used in this code uses re.sub() to replace the following in the subject text:

.*?                 # consume the subject text as few as possible
(?!(?:\s*\S){13})   # CONDITION: Can't be followed by 13 chars
                    #  so it can only start matching when there are 12 to $
(\S)\s*(\S)         # Capture a char in group 1, next char in group 2
                    #
  # The match is replaced with :\1\2
  # For this example, re.sub() returns ":aa:bb:cc:dd:ee:ff"
  # We'll then apply [1:] to the returned value to discard the leading ":"

You can use re.finditer to find all the pairs then join the result :

>>> my_string='0x0000000000aa bb ccdd ee ff'
>>> ':'.join([i.group() for i in re.finditer( r'([a-z])\1+',my_string )])
'aa:bb:cc:dd:ee:ff'

You may do like this,

>>> import re
>>> s = '0x0000000000aa bb ccdd ee ff'
>>> re.sub(r'(?!^)\s*(?=(?:\s*[a-z]{2})+$)', ':', re.sub(r'.*?((?:\s*[a-z]){12})\s*$', r'\1', s ))
'aa:bb:cc:dd:ee:ff'
>>> s = '???767aa bb ccdd ee ff'
>>> re.sub(r'(?!^)\s*(?=(?:\s*[a-z]{2})+$)', ':', re.sub(r'.*?((?:\s*[a-z]){12})\s*$', r'\1', s ))
'aa:bb:cc:dd:ee:ff'
>>> s = '???767aa bb ccdd eeff    '
>>> re.sub(r'(?!^)\s*(?=(?:\s*[a-z]{2})+$)', ':', re.sub(r'.*?((?:\s*[a-z]){12})\s*$', r'\1', s ))
'aa:bb:cc:dd:ee:ff'

I know this is not a direct answer to your question, but do you really need a regular expression? If your format is fixed, this should also work:

>>> s = '0x0000000000aa bb ccdd ee ff'
>>> ':'.join([s[-16:-8].replace(' ', ':'), s[-8:].replace(' ', ':')])
'aa:bb:cc:dd:ee:ff'

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM