I want to compare two strings such that the comparison should ignore differences in the special characters. That is,
Hai, this is a test
Should match with
Hai ! this is a test "or" Hai this is a test
Is there any way to do this without modifying the original strings?
This removes punctuation and whitespace before doing the comparison:
In [32]: import string
In [33]: def compare(s1, s2):
...: remove = string.punctuation + string.whitespace
...: return s1.translate(None, remove) == s2.translate(None, remove)
In [34]: compare('Hai, this is a test', 'Hai ! this is a test')
Out[34]: True
>>> def cmp(a, b):
... return [c for c in a if c.isalpha()] == [c for c in b if c.isalpha()]
...
>>> cmp('Hai, this is a test', 'Hai ! this is a test')
True
>>> cmp('Hai, this is a test', 'Hai this is a test')
True
>>> cmp('Hai, this is a test', 'other string')
False
This creates two temporary lists, but doesn't modify the original strings in any way.
To compare an arbitrary number of strings for alphabetic equivalence,
def samealphabetic(*args):
return len(set(filter(lambda s: s.isalpha(), arg) for arg in args)) <= 1
print samealphabetic('Hai, this is a test',
'Hai ! this is a test',
'Hai this is a test')
Which prints True
. Should change <=
depending on what you want to return for no arguments.
Generally, you'd replace the characters you wish to ignore, and then compare them:
import re
def equal(a, b):
# Ignore non-space and non-word characters
regex = re.compile(r'[^\s\w]')
return regex.sub('', a) == regex.sub('', b)
>>> equal('Hai, this is a test', 'Hai this is a test')
True
>>> equal('Hai, this is a test', 'Hai this@#)($! i@#($()@#s a test!!!')
True
Maybe you can first remove the special characters in your two strings, then compare them.
In your example, the special characters are ',','!' and space.
so for your strings:
a='Hai, this is a test'
b='Hai ! this is a test'
tempa=a.translate(None,',! ')
tempb=b.translate(None,',! ')
then you can just compare tempa and tempb.
Use the Levenshtein metric to measure distance between two strings. Rank your string comparisons by score. Pick the top n matches.
Since you mention that you don't want to modify the original strings, you can also do the operation in-place and without requiring any extra space.
>>> import string
>>> first = "Hai, this is a test"
>>> second = "Hai ! this is a test"
>>> third = "Hai this is a test"
>>> def my_match(left, right):
i, j = 0, 0
ignored = set(string.punctuation + string.whitespace)
while i < len(left) and j < len(right):
if left[i] in ignored:
i += 1
elif right[j] in ignored:
j += 1
elif left[i] != right[j]:
return False
else:
i += 1
j += 1
if i != len(left) or j != len(right):
return False
return True
>>> my_match(first, second)
True
>>> my_match(first, third)
True
>>> my_match("test", "testing")
False
The solution given by root is compatible with Python 2.7 but not Python 3. *
Here are some quick receipe for it.
import string
def compare(s1, s2):
remove = string.punctuation + string.whitespace
mapping = {ord(c): None for c in remove}
print(f'Mapping: \n{mapping}')
return s1.translate(mapping) == s2.translate(mapping)
check = compare('Hai, this is a test', 'Hai ! this is a test')
print(check)
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.