[英]How can I check if a string has the same characters? Python
I need to be able to discern if a string of an arbitrary length, greater than 1 (and only lowercase), has the same set of characters within a base or template string.我需要能够辨别任意长度的字符串是否大于 1(并且只有小写字母)在基本字符串或模板字符串中具有相同的字符集。
For example, take the string "aabc": "azbc" and "aaabc" would be false while "acba" would be true.例如,以字符串“aabc”为例:“azbc”和“aaabc”为假,而“acba”为真。
Is there a fast way to do this in python without keeping track of all the permutations of the first string and then comparing it to the test string?有没有一种快速的方法可以在 python 中执行此操作,而无需跟踪第一个字符串的所有排列,然后将其与测试字符串进行比较?
Sort the two strings and then compare them: 对两个字符串进行排序,然后比较它们:
sorted(str1) == sorted(str2)
If the strings might not be the same length, you might want to make sure of that first to save time: 如果字符串的长度可能不同,您可能需要首先确保它们节省时间:
len(str1) == len(str2) and sorted(str1) == sorted(str2)
This is the O(n)
solution 这是
O(n)
解决方案
from collections import Counter
Counter(str1) == Counter(str2)
But the O(n * log n)
solution using sorted
is likely faster for sensible values of n
但
O(n * log n)
溶液中使用sorted
可能是更快的合理值n
Here's a variation on @Joowani's solution that only uses one dictionary and runs even faster (at least on my machine) : 这是@Joowani解决方案的变体,只使用一个字典并且运行得更快(至少在我的机器上):
def cmp4(str1, str2):
if len(str1) != len(str2):
return False
d = collections.defaultdict(int)
for c in str1:
d[c] += 1
for c in str2:
d[c] -= 1
return all(v == 0 for v in d.itervalues())
Here is another O(n) solution, longer but slightly faster than others: 这是另一个O(n)解决方案,比其他解决方案更长但更快:
def cmp(str1, str2):
if len(str1) != len(str2):
return False
d, d2 = {}, {}
for char in str1:
if char not in d:
d[char] = 1
else:
d[char] += 1
for char in str2:
if char not in d:
return False
if char not in d2:
d2[char] = 1
else:
d2[char] += 1
return d == d2
It basically does the same thing as gnibber's solution (but for some strange reasons the Counter() from collections library seems quite slow). 它基本上与gnibber的解决方案做同样的事情(但由于一些奇怪的原因,来自集合库的Counter()看起来很慢)。 Here are some timeit results:
以下是一些时间结果:
setup = '''
import collections
from collections import Counter
s1 = "abcdefghijklmnopqrstuvwxyz" * 10000
s2 = s1[::-1]
def cmp1(str1, str2):
if len(str1) != len(str2):
return False
d, d2 = {}, {}
for char in str1:
if char not in d:
d[char] = 1
else:
d[char] += 1
for char in str2:
if char not in d:
return False
if char not in d2:
d2[char] = 1
else:
d2[char] += 1
return d == d2
def cmp2(str1, str2):
return len(str1) == len(str2) and sorted(str1) == sorted(str2)
def cmp3(str1, str2):
return Counter(str1) == Counter(str2)
def cmp4(str1, str2):
if len(str1) != len(str2):
return False
d = collections.defaultdict(int)
for c in str1:
d[c] += 1
for c in str2:
d[c] -= 1
return all(v == 0 for v in d.itervalues())
'''
timeit.timeit("cmp1(s1, s2)", setup=setup, number = 100)
8.027034027221656
timeit.timeit("cmp2(s1, s2)", setup=setup, number = 100)
8.175071701324946
timeit.timeit("cmp3(s1, s2)", setup=setup, number = 100)
14.243422195893174
timeit.timeit("cmp4(s1, s2)", setup=setup, number = 100)
5.0937542822775015
Also, David's solution comes out on top when the string sizes are small and they actually have same characters. 此外,当字符串大小很小并且它们实际上具有相同的字符时,David的解决方案在顶部出现。
EDIT: updated the test results 编辑:更新测试结果
Heres a different way. 这是另一种方式。 By using what we ignore the most "sets":
通过使用我们忽略最多的“集合”:
if len(set(str1) - set(str2)) == 0:
print "Yes"
If you have a very long string, the following solution will be helpful with O(n) time complexity.如果您有一个很长的字符串,以下解决方案将有助于 O(n) 时间复杂度。 You can also use an hash map\/dictionary instead of the arrays\/lists.
您还可以使用哈希映射\/字典代替数组\/列表。
s1 = "sjkhdfkaljdhfaldflflad"
s2 = "lsdhfuisfslffsdjdkllja"
if len(s1)!=len(s2):
return False
ds1 = [0] * 26
ds2 = [0] * 26
for i in range(len(s1)):
ds1[ord(s1[i])-ord("a")] +=1
ds2[ord(s2[i])-ord("a")] +=1
return ds1 == ds2
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.