简体   繁体   English

Python检查O(n)解决方案中的Anagram

[英]Python check for Anagram in O(n) solution

I'm trying to check if 2 strings are anagrams. 我正在尝试检查2个字符串是否为字谜。 This solution is simple, but not efficient (Ologn) I know I could use Collections and Counter, then compare the occurrence of each character, but I'm trying to avoid any modules for an interview. 这个解决方案很简单,但是效率不高(登录),我知道我可以使用“集合”和“计数器”,然后比较每个字符的出现情况,但是我试图避免面试的任何模块。 What would be the fastest way to solve this problem? 解决这个问题的最快方法是什么? (Perhaps, checking occurrence of each character?) (也许正在检查每个字符的出现?)

def check(word1,word2):

    return sorted(word1)==sorted(word2)

Your code doesn't even return a correct value. 您的代码甚至没有返回正确的值。 This one-liner is O(n log n): 这个单线是O(n log n):

return sorted(word1) == sorted(word2)

For an O(n) solution, you can count all characters: 对于O(n)解决方案,您可以计算所有字符:

from collections import Counter
# ...
def check(a, b)
  return Counter(a) == Counter(b)

Without collections it is much longer: 如果没有集合,则需要更长的时间:

def check(a, b):
    chars = dict.fromkeys(a + b, 0)
    for c in a:
        chars[c] += 1
    for c in b:
        chars[c] -= 1
    return not any(chars.values())

This code does the following: 此代码执行以下操作:

  • chars = dict.fromkeys(a + b, 0) : Creates a dict, which has all the occurring characters in either word as keys set to 0. chars = dict.fromkeys(a + b, 0) :创建一个dict,将任一单词中所有出现的字符作为键设置为0。
  • for c in a: chars[c] += 1 : this will iterate over a and count the occurrences of each character in it. for c in a: chars[c] += 1 :它将遍历a并计算其中每个字符的出现次数。 chars now contains the count of separate characters, (and some zeroes for characters in b but not a) 现在, chars包含单独字符的计数(b中的字符为零,但a中的字符为零)
  • for c in b: chars[c] -= 1 : much the same as before, but instead this will subtract the character counts of b from chars for c in b: chars[c] -= 1 :与以前非常相似,但是这将从chars减去b的字符数
  • return not any(chars.values()) : chars['h'] == 0 if and only if a and b has the same amount of 'h' . 当且仅当ab具有相同数量的'h' return not any(chars.values())chars['h'] == 0 This line checks if chars has only zeroes as values, meaning that all characters have the same count in both inputs. 此行检查chars是否只有零作为值,这意味着两个输入中所有字符的计数都相同。 (as any returns if there is any truthy value in the sequence. 0 is falsy, every other integer is truthy.) (如序列中有真值的any返回值。0为假,其他所有整数均为真。)

Both lists get iterated over once. 两个列表都被迭代一次。 Assuming O(1) access time for dictionaries makes the whole algorithm run in O(n) time (where n is the total length of the inputs). 假设字典的访问时间为O(1),则整个算法将以O(n)的时间运行(其中n是输入的总长度)。 Space complexity is O(n) too (all characters can be distinct). 空间复杂度也是O(n)(所有字符都可以是不同的)。 Don't make that mistake when they ask you complexity. 当他们问您复杂性时,请不要犯这个错误。 It's not necessary time complexity. 不必花费时间。

Here's a nice option from http://interactivepython.org/runestone/static/pythonds/AlgorithmAnalysis/AnAnagramDetectionExample.html : 这是来自http://interactivepython.org/runestone/static/pythonds/AlgorithmAnalysis/AnAnagramDetectionExample.html的一个不错的选择:

def anagramSolution(s1,s2):

    TABLE_SIZE = 128
    c1 = [0]*TABLE_SIZE
    c2 = [0]*TABLE_SIZE

    for ch in s1:
        pos = ord(ch)
        c1[pos] = c1[pos] + 1

    for ch in s2:
        pos = ord(ch)
        c2[pos] = c2[pos] + 1

    j = 0
    stillOK = True
    while j<TABLE_SIZE and stillOK:
        if c1[j]==c2[j]:
            j = j + 1
        else:
            stillOK = False

    return stillOK

This runs in O(n). 这在O(n)中运行。 Essentially, you loop over both strings, counting the occurrences of each letter. 本质上,您遍历两个字符串,计算每个字母的出现次数。 In the end, you can simply iterate over each letter, making sure the counts are equal. 最后,您可以简单地遍历每个字母,确保计数相等。

As noted in the comments, this will have a harder time scaling for unicode. 如评论中所述,这将对unicode造成更困难的时间缩放。 If you expect unicode, you would likely want to use a dictionary. 如果您希望使用unicode,则可能需要使用字典。

I'd write it like this without imports: 我会这样写,没有导入:

def count_occurences(mystring):
    occs = {}
    for char in mystring:
        if char in occs:
            occs[char] += 1
        else:
            occs[char] = 1
    return occs

def is_anagram(str1, str2):
    return count_occurences(str1) == count_occurences(str2)

Or, if you can use imports, just not a Counter , use a defaultdict : 或者,如果可以使用import,而不能使用Counter ,请使用defaultdict

from collections import defaultdict

def count_occurences(mystring):
    occs = defaultdict(int)
    for char in mystring:
        occs[char] += 1

    return occs

def is_anagram(str1, str2):
    return count_occurences(str1) == count_occurences(str2)

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM