简体   繁体   中英

Find the difference between two strings of uneven length in python

a = 'abcdfjghij'
b = 'abcdfjghi'

Output: j

def diff(a, b):
    string=''
    for val in a:
        if val not in b:
            string=val
    return string

a = 'abcdfjghij'
b = 'abcdfjghi'
print(diff(a,b))

This code returns an empty string. Any solution for this?

collections.Counter from the standard library can be used to model multi-sets, so it keeps track of repeated elements. It's a subclass of dict which is performant and extends its functionality for counting purposes. To find differences between two strings you can mimic a symmetric difference between sets.

from collections import Counter

a = 'abcdfjghij'
b = 'abcdfjghi'

ca = Counter(a)
cb = Counter(b)

diff = (cb-ca)+(ca-cb) # symmetric difference

print(diff)
#Counter({'j': 1})

if I understand correctyl your question is:

"given 2 strings of different length, how can I find the characters that are different between them?"

So judging by your example, this implies you want either the characters that are only present in 1 of the strings and not on the other, or characters that might be repeated and which count is different in between the two strings.

Here's a simple solution (maybe not the most efficient one), but one that's short and does not require any extra packages:

**UPDATED: **

a = 'abcdfjghij'
b = 'abcdfjghi' 

dict_a = dict( (char, a.count(char)) for char in a)
dict_b = dict( (char, b.count(char)) for char in b)

idx_longest = [dict_a, dict_b].index(max([dict_a, dict_b], key = len))

results = [ k for (k,v) in [dict_a, dict_b][idx_longest].items() if k not in [dict_a, dict_b][1-idx_longest].keys() or v!=[dict_a, dict_b][1-idx_longest][k] ]

print(results)
 > ['j']

or you can try with other pair of strings such as

a = 'abcaa'
b = 'aaa'

print(results)
 > ['b', 'c']

as 'a' is in both string an equal number of times.

updated

But you have j twice in a. So the first time it sees j it looks at b and sees aj, all good. For the second j it looks again and still sees aj, all good. Are you wanting to check if each letter is the same as the other letter in the same sequence, then you should try this:

a = 'abcdfjghij'
b = 'abcdfjghi'

def diff(a, b):
  if len(a)>len(b):
    smallest_len = len(b)
    for index, value in enumerate(a[:smallest_len]):
      if a[index] != b[index]:
        print(f'a value {a[index]} at index {index} does not match b value {b[index]}')
    if len(a) == len(b):
      pass
    else:
      print(f'Extra Values in A Are {a[smallest_len:]}')
  else:
    smallest_len = len(a)
    for index, value in enumerate(b[:smallest_len]):
      if a[index] != b[index]:
        print(f'a value {a[index]} at index {index} does not match b value {b[index]}')
    if len(a) == len(b):
      pass
    else:
      print(f'Extra Values in B Are {b[smallest_len:]}')
  

diff(a, b)

在此处输入图像描述

In your example, there are 2 differences between the 2 strings: The letter g and j. I tested your code and it returns g because all the other letters from are in b:

a = 'abcdfjghij'
b = 'abcdfjhi'

def diff(a, b):
    string=''
    for val in a:
        if val not in b:
            string=val
    return string

print(diff(a,b))

Its hard to know exactly what you want based on your question. Like should

'abc'
'efg'

return 'abc' or 'efg' or is there always just going to be one character added?

Here is a solution that accounts for multiple characters being different but still might not give your exact output.

def diff(a, b):
    string = ''
    
    if(len(a) >= len(b)):
        longString = a
        shortString = b
    else:
        longString = b
        shortString = a
    for i in range(len(longString)):
        if(i >= len(shortString) or longString[i] != shortString[i]):
            string += longString[i]
    return string

a = 'abcdfjghij'
b = 'abcdfjghi'
print(diff(a,b))

if one string just has one character added and i could be anywhere in the string you could change

string += longString[i]

to

string = longString[i]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM