简体   繁体   中英

Python : How to compare strings and ignore white space and special characters

I want to compare two strings such that the comparison should ignore differences in the special characters. That is,

Hai, this is a test

Should match with

Hai ! this is a test "or" Hai this is a test

Is there any way to do this without modifying the original strings?

This removes punctuation and whitespace before doing the comparison:

In [32]: import string

In [33]: def compare(s1, s2):
    ...:     remove = string.punctuation + string.whitespace
    ...:     return s1.translate(None, remove) == s2.translate(None, remove)

In [34]: compare('Hai, this is a test', 'Hai ! this is a test')
Out[34]: True
>>> def cmp(a, b):
...     return [c for c in a if c.isalpha()] == [c for c in b if c.isalpha()]
... 
>>> cmp('Hai, this is a test', 'Hai ! this is a test')
True
>>> cmp('Hai, this is a test', 'Hai this is a test')
True
>>> cmp('Hai, this is a test', 'other string')
False

This creates two temporary lists, but doesn't modify the original strings in any way.

To compare an arbitrary number of strings for alphabetic equivalence,

def samealphabetic(*args):
    return len(set(filter(lambda s: s.isalpha(), arg) for arg in args)) <= 1

print samealphabetic('Hai, this is a test',
                     'Hai ! this is a test',
                     'Hai this is a test')

Which prints True . Should change <= depending on what you want to return for no arguments.

Generally, you'd replace the characters you wish to ignore, and then compare them:

import re
def equal(a, b):
    # Ignore non-space and non-word characters
    regex = re.compile(r'[^\s\w]')
    return regex.sub('', a) == regex.sub('', b)

>>> equal('Hai, this is a test', 'Hai this is a test')
True
>>> equal('Hai, this is a test', 'Hai this@#)($! i@#($()@#s a test!!!')
True

Maybe you can first remove the special characters in your two strings, then compare them.

In your example, the special characters are ',','!' and space.

so for your strings:

a='Hai, this is a test'
b='Hai ! this is a test'
tempa=a.translate(None,',! ')
tempb=b.translate(None,',! ')

then you can just compare tempa and tempb.

Use the Levenshtein metric to measure distance between two strings. Rank your string comparisons by score. Pick the top n matches.

Since you mention that you don't want to modify the original strings, you can also do the operation in-place and without requiring any extra space.

>>> import string
>>> first = "Hai, this is a test"
>>> second = "Hai ! this is a test"
>>> third = "Hai this is a test"
>>> def my_match(left, right):
    i, j = 0, 0
    ignored = set(string.punctuation + string.whitespace)
    while i < len(left) and j < len(right):
        if left[i] in ignored:
            i += 1
        elif right[j] in ignored:
            j += 1
        elif left[i] != right[j]:
            return False
        else:
            i += 1
            j += 1
    if i != len(left) or j != len(right):
        return False
    return True

>>> my_match(first, second)
True
>>> my_match(first, third)
True
>>> my_match("test", "testing")
False

Solution for Python 3.*

The solution given by root is compatible with Python 2.7 but not Python 3. *

Here are some quick receipe for it.

  1. Using the same solution but trasnpiled for Python 3.* The translate function now takes only one paremeter, which is a mapping table of the ordinals (integers) character to be removed.

import string

    def compare(s1, s2):
        remove = string.punctuation + string.whitespace
        mapping = {ord(c): None for c in remove}
        print(f'Mapping: \n{mapping}')
        return s1.translate(mapping) == s2.translate(mapping)

    check = compare('Hai, this is a test', 'Hai ! this is a test')
    print(check)

Documentatin

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM