Imagine we have following list of strings:
Input: ["eat", "tea", "tan", "ate", "nat", "bat"]
The output of our program should group each set of anagram and return them all together as a list as following:
Output:
[
["ate","eat","tea"],
["nat","tan"],
["bat"]
]
My current solution finds the first set of anagrams but fails to detect the other two and instead, duplicates the first groups into the list:
class Solution(object):
def groupAnagrams(self, strs):
allResults=[]
results=[]
temp=''
for s in strs:
temp=s[1:]+s[:1]
for i in range(0,len(strs)):
if temp==strs[i]:
results.append(strs[i])
allResults.append(results)
return allResults
and the output is:
[["ate","eat","tea"],["ate","eat","tea"],["ate","eat","tea"],["ate","eat","tea"],["ate","eat","tea"],["ate","eat","tea"]]
How to fix this issue?
EDIT: I have fixed the duplication in appending by appending the results
into allResults
outside of second loop:
class Solution(object):
def groupAnagrams(self, strs):
allResults=[]
results=[]
temp=''
for s in strs:
temp=s[1:]+s[:1]
for i in range(0,len(strs)):
if temp==strs[i]:
results.append(strs[i])
allResults.append(results)
print(results)
return allResults
Yet, it does not detect the other two sets of anagrams.
you can do it using defaultdict of python in-built collections library and sorted :
In [1]: l = ["eat", "tea", "tan", "ate", "nat", "bat"]
In [2]: from collections import defaultdict
In [3]: d = defaultdict(list)
In [4]: for x in l:
...: d[str(sorted(x))].append(x)
In [5]: d.values()
Out[5]: dict_values([['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']])
to fix your the solution you need add the variable to check is allready added, for exanmple(and the while walk through the strs
i use enumerate for little performance in the search of the anagrams):
class Solution(object): def groupAnagrams(self, strs): allResults = [] temp='' for i, s in enumerate(strs): results = [] for x in strs[i:]: if unique_s=="".join(sorted(x)): results.append(strs[i]) allResults.append(results)
print(added)
return allResults
Use itertools.groupby
>>> lst = ["eat", "tea", "tan", "ate", "nat", "bat"]
>>>
>>> from itertools import groupby
>>> f = lambda w: sorted(w)
>>> [list(v) for k,v in groupby(sorted(lst, key=f), f)]
[['bat'], ['eat', 'tea', 'ate'], ['tan', 'nat']]
The way you implemented your function, you are only looking at rotations of the strings (that is you shift a letter from the beginning to the end, eg ate -> tea -> eat). What your algorithm cannot detect is single permutations were you only switch two letters (nat -> tan). In mathematical language you only consider the even permutations of the three letter strings and not the odd permutations.
A modification of your code could for example be:
def get_list_of_permutations(input_string):
list_out = []
if len(input_string) > 1:
first_char = input_string[0]
remaining_string = input_string[1:]
remaining_string_permutations = get_list_of_permutations(remaining_string)
for i in range(len(remaining_string)+1):
for permutation in remaining_string_permutations:
list_out.append(permutation[0:i]+first_char+permutation[i:])
else:
return [input_string]
return list_out
def groupAnagrams(strs):
allResults=[]
for s in strs:
results = []
list_of_permutations = get_list_of_permutations(s)
for i in range(0,len(strs)):
if strs[i] in list_of_permutations:
results.append(strs[i])
if results not in allResults:
allResults.append(results)
return allResults
The output is
Out[218]: [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
Edit: modified the code to work with all lengths of strings.
The second line s_words
takes all the letters of each word
in words
, sorts them, and recreates a string composed of the sorted letters of the word; it creates a list of all the these sorted letters strings, in the same order as the original sequence of words --> this will be used to compare the possible anagrams (the letters of anagrams produce the same string when sorted)
The 3rd line indices
hold True
or False
values, to indicate if the corresponding word has been extracted already, and avoid duplicates.
The following code is a double loop that for each s_word, determines which other s_word is identical, and uses its index to retrieve the corresponding word in the original list of words; it also updates the truth value of the indices.
words = ["eat", "tea", "tan", "ate", "nat", "bat"]
s_words = [''.join(sorted(list(word))) for word in words]
indices = [False for _ in range(len(words))]
anagrams = []
for idx, s_word in enumerate(s_words):
if indices[idx]:
continue
ana = [words[idx]]
for jdx, word in enumerate(words):
if idx != jdx and not indices[jdx] and s_word == s_words[jdx]:
ana.append(words[jdx])
indices[jdx] = True
anagrams.append(ana)
print(anagrams)
[['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]
https://docs.python.org/3/library/itertools.html#itertools.permutations
from itertools import permutations
word_list = ["eat", "tea", "tan", "ate", "nat", "bat"]
anagram_group_list = []
for word in word_list:
if word == None:
pass
else:
anagram_group_list.append([])
for anagram in permutations(word):
anagram = ''.join(anagram)
try:
idx = word_list.index(anagram)
word_list[idx] = None
anagram_group_list[-1].append(anagram)
except ValueError:
pass # this anagram is not present in word_list
print(anagram_group_list)
# [['eat', 'ate', 'tea'], ['tan', 'nat'], ['bat']]
after refactoring code and stopping it from producing redundant result your code still doesn't give expected result as logic for producing anagram is not completely correct
def groupAnagrams(word_list):
allResults=[]
results=[]
for idx,s in enumerate(word_list):
if s == None:
pass
else:
results = [s] # word s is added to anagram list
# you were generating only 1 anagram like for tan --> ant but in word_list only nat was present
for i in range(1,len(s),1):
temp = s[i:]+s[:i] #anagram
# for s = 'tan' it generates only 'ant and 'nta'
# when it should generate all six tna ant nta _nat_ atn tan
if temp in word_list:
results.append(temp)
word_list[word_list.index(temp)] = None
allResults.append(results)
return allResults
print(groupAnagrams(["eat", "tea", "tan", "ate", "nat", "bat"]))
# [['eat', 'ate', 'tea'], ['tan'], ['nat'], ['bat']]
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.