简体   繁体   中英

Anagrams in Python using lists

Imagine we have following list of strings:

Input: ["eat", "tea", "tan", "ate", "nat", "bat"]

The output of our program should group each set of anagram and return them all together as a list as following:

Output:
[
  ["ate","eat","tea"],
  ["nat","tan"],
  ["bat"]
]

My current solution finds the first set of anagrams but fails to detect the other two and instead, duplicates the first groups into the list:

class Solution(object):
    def groupAnagrams(self, strs):
        allResults=[]
        results=[]
        temp=''
        for s in strs:  
          temp=s[1:]+s[:1]
          for i in range(0,len(strs)):
              if temp==strs[i]:
                results.append(strs[i])
          allResults.append(results)      
        return allResults 

and the output is:

[["ate","eat","tea"],["ate","eat","tea"],["ate","eat","tea"],["ate","eat","tea"],["ate","eat","tea"],["ate","eat","tea"]]

How to fix this issue?

EDIT: I have fixed the duplication in appending by appending the results into allResults outside of second loop:

class Solution(object):
def groupAnagrams(self, strs):
    allResults=[]
    results=[]
    temp=''
    for s in strs:  
      temp=s[1:]+s[:1]
      for i in range(0,len(strs)):
          if temp==strs[i]:
            results.append(strs[i])
    allResults.append(results) 
    print(results)
    return allResults  

Yet, it does not detect the other two sets of anagrams.

you can do it using defaultdict of python in-built collections library and sorted :

In [1]: l = ["eat", "tea", "tan", "ate", "nat", "bat"]

In [2]: from collections import defaultdict

In [3]: d = defaultdict(list)

In [4]: for x in l:
   ...:     d[str(sorted(x))].append(x)

In [5]: d.values()
Out[5]: dict_values([['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']])

to fix your the solution you need add the variable to check is allready added, for exanmple(and the while walk through the strs i use enumerate for little performance in the search of the anagrams):

class Solution(object):
    def groupAnagrams(self, strs):
        allResults = []
        
        temp=''
        for i, s in enumerate(strs):
            results = []
            
            for x in strs[i:]:
              if unique_s=="".join(sorted(x)):
                results.append(strs[i])
            allResults.append(results)
print(added)
return allResults

Use itertools.groupby

>>> lst =  ["eat", "tea", "tan", "ate", "nat", "bat"]
>>> 
>>> from itertools import groupby
>>> f = lambda w: sorted(w)
>>> [list(v) for k,v in groupby(sorted(lst, key=f), f)]
[['bat'], ['eat', 'tea', 'ate'], ['tan', 'nat']]

The way you implemented your function, you are only looking at rotations of the strings (that is you shift a letter from the beginning to the end, eg ate -> tea -> eat). What your algorithm cannot detect is single permutations were you only switch two letters (nat -> tan). In mathematical language you only consider the even permutations of the three letter strings and not the odd permutations.

A modification of your code could for example be:

def get_list_of_permutations(input_string):
  list_out = []
  if len(input_string) > 1:
    first_char = input_string[0]
    remaining_string = input_string[1:]
    remaining_string_permutations = get_list_of_permutations(remaining_string)
    for i in range(len(remaining_string)+1):
      for permutation in remaining_string_permutations:
        list_out.append(permutation[0:i]+first_char+permutation[i:])
  else:
    return [input_string]
  return list_out

def groupAnagrams(strs):
  allResults=[]
  for s in strs:  
    results = []
    list_of_permutations = get_list_of_permutations(s)
    for i in range(0,len(strs)):
      if strs[i] in list_of_permutations:
        results.append(strs[i])
    if results not in allResults:
      allResults.append(results)     
  return allResults 

The output is

Out[218]: [['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]

Edit: modified the code to work with all lengths of strings.

Using only lists, as requested in the title of the question:

The second line s_words takes all the letters of each word in words , sorts them, and recreates a string composed of the sorted letters of the word; it creates a list of all the these sorted letters strings, in the same order as the original sequence of words --> this will be used to compare the possible anagrams (the letters of anagrams produce the same string when sorted)

The 3rd line indices hold True or False values, to indicate if the corresponding word has been extracted already, and avoid duplicates.

The following code is a double loop that for each s_word, determines which other s_word is identical, and uses its index to retrieve the corresponding word in the original list of words; it also updates the truth value of the indices.

words = ["eat", "tea", "tan", "ate", "nat", "bat"]
s_words = [''.join(sorted(list(word))) for word in words]
indices = [False for _ in range(len(words))]
anagrams = []
for idx, s_word in enumerate(s_words):
    if indices[idx]:
        continue
    ana = [words[idx]]
    for jdx, word in enumerate(words):
        if idx != jdx and not indices[jdx] and s_word == s_words[jdx]:
            ana.append(words[jdx])
            indices[jdx] = True
    anagrams.append(ana)

print(anagrams)

output:

[['eat', 'tea', 'ate'], ['tan', 'nat'], ['bat']]

https://docs.python.org/3/library/itertools.html#itertools.permutations

from itertools import permutations

word_list = ["eat", "tea", "tan", "ate", "nat", "bat"]
anagram_group_list = []

for word in word_list:

    if word == None:
        pass
    else:
        anagram_group_list.append([])

        for anagram in permutations(word):
            anagram = ''.join(anagram)

            try:
                idx = word_list.index(anagram)
                word_list[idx] = None 

                anagram_group_list[-1].append(anagram)

            except ValueError:
                pass # this anagram is not present in word_list

print(anagram_group_list)
# [['eat', 'ate', 'tea'], ['tan', 'nat'], ['bat']]

after refactoring code and stopping it from producing redundant result your code still doesn't give expected result as logic for producing anagram is not completely correct

def groupAnagrams(word_list):
    allResults=[]
    results=[]

    for idx,s in enumerate(word_list):
        if s == None:
            pass
        else:
            results = [s] # word s is added to anagram list

            # you were generating only 1 anagram like for tan --> ant but in word_list only nat was present
            for i in range(1,len(s),1):
                temp = s[i:]+s[:i] #anagram 
                    # for s = 'tan' it generates only 'ant and 'nta'
                    # when it should generate all six tna ant nta _nat_ atn tan

                if temp in word_list:
                  results.append(temp)
                  word_list[word_list.index(temp)] = None

            allResults.append(results) 

    return allResults

print(groupAnagrams(["eat", "tea", "tan", "ate", "nat", "bat"]))
# [['eat', 'ate', 'tea'], ['tan'], ['nat'], ['bat']]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM