简体   繁体   中英

Match everything except a pattern and replace matched with string

I want to use python in order to manipulate a string I have.
Basically, I want to prepend"\\x" before every hex byte except the bytes that already have "\\x" prepended to them.

My original string looks like this:

mystr = r"30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00"

And I want to create the following string from it:

mystr = r"\x30\x33\x62\x37\x61\x31\x31\x90\x01\x0A\x90\x02\x14\x6F\x6D\x6D\x61\x6E\x64\x90\x01\x06\x90\x02\x0F\x52\x65\x6C\x61\x74\x90\x01\x02\x90\x02\x50\x65\x6D\x31\x90\x00"

I thought of using regular expressions to match everything except /\\x../g and replace every match with "\\x". Sadly, I struggled with it a lot without any success. Moreover, I'm not sure that using regex is the best approach to solve such case.

Regex : (?:\\\\x)?([0-9A-Z]{2}) Substitution : \\\\x$1

Details :

  • (?:) Non-capturing group
  • ? Matches between zero and one time, match string \\x if it exists.
  • () Capturing group
  • [] Match a single character present in the list 0-9 and AZ
  • {n} Matches exactly n times
  • \\\\x String \\x
  • $1 Group 1.

Python code :

import re

text = R'30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00'
text = re.sub(R'(?:\\x)?([0-9A-Z]{2})', R'\\x\1', text)
print(text)

Output :

\x30\x33\x62\x37\x61\x31\x31\x90\x01\x0A\x90\x02\x14\x6F\x6D\x6D\x61\x6E\x64\x90\x01\x06\x90\x02\x0F\x52\x65\x6C\x61\x74\x90\x01\x02\x90\x02\x50\x65\x6D\x31\x90\x00

Code demo

You don't need regex for this. You can use simple string manipulation. First remove all of the "\\x" from your string. Then add add it back at every 2 characters.

replaced = mystr.replace(r"\x", "")
newstr = "".join([r"\x" + replaced[i*2:(i+1)*2] for i in range(len(replaced)/2)])

Output:

>>> print(newstr)
\x30\x33\x62\x37\x61\x31\x31\x90\x01\x0A\x90\x02\x14\x6F\x6D\x6D\x61\x6E\x64\x90\x01\x06\x90\x02\x0F\x52\x65\x6C\x61\x74\x90\x01\x02\x90\x02\x50\x65\x6D\x31\x90\x00

You can get a list with your values to manipulate as you wish, with an even simpler re pattern

mystr = r"30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00"

import re

pat = r'([a-fA-F0-9]{2})'
match = re.findall(pat, mystr)

if match:
    print('\n\nNew string:')
    print('\\x' + '\\x'.join(match))
    #for elem in match: # match gives you a list of strings with the hex values
    #    print('\\x{}'.format(elem), end='')

print('\n\nOriginal string:')
print(mystr)

This can be done without replacing existing \\x by using a combination of positive lookbehinds and negative lookaheads.

(?!(?<=\\x)|(?<=\\x[a-f\d]))([a-f\d]{2})

Usage

See code in use here

import re

regex = r"(?!(?<=\\x)|(?<=\\x[a-f\d]))([a-f\d]{2})"
test_str = r"30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00"
subst = r"\\x$1"

result = re.sub(regex, subst, test_str, 0, re.IGNORECASE)

if result:
    print (result)

Explanation

  • (?!(?<=\\\\x)|(?<=\\\\x[af\\d])) Negative lookahead ensuring either of the following doesn't match.
    • (?<=\\\\x) Positive lookbehind ensuring what precedes is \\x .
    • (?<=\\\\x[af\\d]) Positive lookbehind ensuring what precedes is \\x followed by a hexidecimal digit.
  • ([af\\d]{2}) Capture any two hexidecimal digits into capture group 1.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM