简体   繁体   中英

Match everything except a pattern and replace matched with string

I want to use python in order to manipulate a string I have.
Basically, I want to prepend"\\x" before every hex byte except the bytes that already have "\\x" prepended to them.

My original string looks like this:

mystr = r"30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00"

And I want to create the following string from it:

mystr = r"\x30\x33\x62\x37\x61\x31\x31\x90\x01\x0A\x90\x02\x14\x6F\x6D\x6D\x61\x6E\x64\x90\x01\x06\x90\x02\x0F\x52\x65\x6C\x61\x74\x90\x01\x02\x90\x02\x50\x65\x6D\x31\x90\x00"

I thought of using regular expressions to match everything except /\\x../g and replace every match with "\\x". Sadly, I struggled with it a lot without any success. Moreover, I'm not sure that using regex is the best approach to solve such case.

Regex : (?:\\\\x)?([0-9A-Z]{2}) Substitution : \\\\x$1

Details :

  • (?:) Non-capturing group
  • ? Matches between zero and one time, match string \\x if it exists.
  • () Capturing group
  • [] Match a single character present in the list 0-9 and AZ
  • {n} Matches exactly n times
  • \\\\x String \\x
  • $1 Group 1.

Python code :

import re

text = R'30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00'
text = re.sub(R'(?:\\x)?([0-9A-Z]{2})', R'\\x\1', text)

Output :


Code demo

You don't need regex for this. You can use simple string manipulation. First remove all of the "\\x" from your string. Then add add it back at every 2 characters.

replaced = mystr.replace(r"\x", "")
newstr = "".join([r"\x" + replaced[i*2:(i+1)*2] for i in range(len(replaced)/2)])


>>> print(newstr)

You can get a list with your values to manipulate as you wish, with an even simpler re pattern

mystr = r"30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00"

import re

pat = r'([a-fA-F0-9]{2})'
match = re.findall(pat, mystr)

if match:
    print('\n\nNew string:')
    print('\\x' + '\\x'.join(match))
    #for elem in match: # match gives you a list of strings with the hex values
    #    print('\\x{}'.format(elem), end='')

print('\n\nOriginal string:')

This can be done without replacing existing \\x by using a combination of positive lookbehinds and negative lookaheads.



See code in use here

import re

regex = r"(?!(?<=\\x)|(?<=\\x[a-f\d]))([a-f\d]{2})"
test_str = r"30336237613131\x90\x01\x0A\x90\x02\x146F6D6D616E64\x90\x01\x06\x90\x02\x0F52656C6174\x90\x01\x02\x90\x02\x50656D31\x90\x00"
subst = r"\\x$1"

result = re.sub(regex, subst, test_str, 0, re.IGNORECASE)

if result:
    print (result)


  • (?!(?<=\\\\x)|(?<=\\\\x[af\\d])) Negative lookahead ensuring either of the following doesn't match.
    • (?<=\\\\x) Positive lookbehind ensuring what precedes is \\x .
    • (?<=\\\\x[af\\d]) Positive lookbehind ensuring what precedes is \\x followed by a hexidecimal digit.
  • ([af\\d]{2}) Capture any two hexidecimal digits into capture group 1.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

粤ICP备18138465号  © 2020-2024 STACKOOM.COM