简体   繁体   中英

Efficient way to Finding string in python

You are given two strings, AA and BB. Find if there is a substring that appears in both AA and BB.

All the strings contain only lowercase Latin letters.

Above, you see a question from hackerranck . I wrote the following program to solve it:

T = int(raw_input())

for t in xrange(T):
    s1 = raw_input()
    s2 = raw_input()
    length1 = len(s1)
    length2 = len(s2)
    checked = list()
    if length1<length2:
        for letter in s1:
            if len(checked)== 26:
                break
            if letter in checked:
                next
            checked.append(letter)
            if letter in s2:
                print "YES"
                break
        else:
            print "NO"
    else:
        for letter in s2:
            if letter in checked:
                next
            if len(checked)==26:
                break
            checked.append(letter)
            if letter in s1:
                print "YES"
                break
        else:
            print "NO"

It worked correct before adding if len(checked)==26: break . I added this line to make it more efficient by checking each alphabetic letter only once and get rid of time-out error in the submitting process, but after adding this line, the answer of my program is wrong for some test cases. why?

Your error is here:

if letter in checked:
    next

next is a function in Python. Using if letter in checked: next is a no-op, you could just as well have used pass , as that will just reference the function object and not call it. It certainly won't continue to the next loop iteration.

So no matter what the outcome of letter in checked is, you continue to add letter to checked . Since checked is a list, not a set, you'll be adding duplicates to the list and you end up with more than 26 entries quite easily.

Use:

if letter in checked:
    continue

and consider using a set for checked to make in membership testing an O(1) operation rather than O(N).

Speaking of sets, this is basically a set intersection problem ; is there any single letter that appears in both s1 and s2 . You are correctly testing if the sets are disjoint; so use the Python built-in set type . This would in the worst case do an O(N * M) loop, but the looping is in C code:

print('NO' if set(s1).isdisjoint(s2) else 'YES')

By using set.isdisjoint() no new set is created, only a boolean is returned. set(s1) loops over all of s1 to produce the set, set.isdisjoint() will exit early as soon as it finds a match, each match test is O(1) against the set.

You could see if swapping s1 and s2 based on length improves your timings for the test still:

if len(s1) > len(s2):
    s1, s2 = s2, s1
print('NO' if set(s1).isdisjoint(s2) else 'YES')

You can use sets and see if the intersection is greater than zero:

"yes" if set(s1).intersection(s2) else "no"

To avoid iterating over the entire list:

"yes" if any(c in s1 for c in s2) else "no"

Depending on the size of s1 and s2 you could possibly benefit from shrinking them to sets:

"yes" if any(c in set(s1) for c in set(s2)) else "no"

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM