简体   繁体   中英

How to compare strings with placeholders in Python

Say you have a string such as this:

"ERROR: Error Number %d, Error Location 0x%x, found exception"

Suppose you write a program to search a text file looking for this exact string but you don't care about the placeholders %d , %x , but you want to make sure you match the string correctly. how would you go about doing it?

One might think why not just compare the substring: "ERROR: Error Number" ..., well suppose then that there are other strings in the text file with the same starting substring "ERROR: Error Number" , but you don't want to capture those.

Use a regular expression. In your case, you would want to use something like:

http://pythex.org/?regex=ERROR%3A%5CsError%5CsNumber%5Cs(%5Cd%2B)%2C%5CsError%5CsLocation%5Cs0x(%5Cd%2B)%2C%5Csfound%5Csexception&test_string=ERROR%3A%20Error%20Number%2032%2C%20Error%20Location%200x420%2C%20found%20exception&ignorecase=0&multiline=0&dotall=0&verbose=0

Example:

import re
arr = ['testing', 'ERROR: Error Number 32, Error Location 0x420, found exception', 'test']
regex = re.compile('ERROR:\sError\sNumber\s(\d+),\sError\sLocation\s0x(\d+),\sfound\sexception')
for string in arr:
    if re.match(regex, string):
        print(string)

Again, used RegEx

You have to convert your format into a RegEx:

  • first, escape the non-alphanumeric characters;
  • replace each format (eg: %d ) by a corresponding RegEx, (eg: \\d+ );
  • add the ^ and $ operators to ensure you have exact match;
  • compile the RegEx for optimisation;
  • use the match , findall , etc.

Here is an example:

import re

my_format = "ERROR: Error Number %d, Error Location 0x%x, found exception"

# Escape all non-alphanumeric characters in pattern
my_regex = re.escape(my_format)

# Mapping: pattern => regex
mapping = [(r"\%d", r"\d+"),
           (r"\%x", r"[0-9a-f]+")]

# Substitute each pattern by regex
for pattern, regex in mapping:
    my_regex = my_regex.replace(pattern, regex)

# Add begin/end operator for exact match
my_regex = "^" + my_regex + "$"
print(my_regex)

# Compile the RegEx, extract the 'match' function
match_my_regex = re.compile(my_regex, re.DOTALL).match

samples = ["789",
           "ERROR: Error Number 123, Error Location 0xaf, found exception",
           "ERROR: Error Number 456, Error Location 0xa0, found exception",
           "Got ERROR: Error Number 123, Error Location 0xaf, found exception"]

for sample in samples:
    print("{0}: match => {1}".format(sample, match_my_regex(sample) is not None))

You will obtain:

789: match => False
ERROR: Error Number 123, Error Location 0xaf, found exception: match => True
ERROR: Error Number 456, Error Location 0xa0, found exception: match => True
Got ERROR: Error Number 123, Error Location 0xaf, found exception: match => False

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM