简体   繁体   English

如何在Python的同一列表中比较和分组等效项?

[英]How do I compare and group equivalent items in the same list in Python?

Note: I am using Python 3.4 注意:我正在使用Python 3.4

I currently have a list of lists containing the following objects: 我目前有一个包含以下对象的列表列表:

class word(object): #object class

    #each word object has 3 attributes (self explanatory)
    def __init__(self, originalWord=None, azWord=None, wLength=None):
        self.originalWord = originalWord
        self.azWord = azWord    #the originalWord alphabetized
        self.wLength = wLength

I want to iterate throughout the list to see if 2 consecutive items have the same azWord attribute. 我想遍历整个列表以查看2个连续项是否具有相同的azWord属性。 Eg bat and tab would both have azWord "abt", so they would be anagrams. 例如bat和tab都具有azWord“ abt”,因此它们将是字谜。 The end goal is to group anagrams and print them to a file. 最终目标是将字谜分组并打印到文件中。 The lists are grouped by word lengths and each list is alphabetized by each object's azWord. 列表按单词长度分组,每个列表按每个对象的azWord字母顺序排列。 If words are anagrams, I want to add them to a temporary list. 如果单词是字谜,我想将它们添加到临时列表中。 I want to do this by comparing the current item I'm looking at to the next one. 我想通过将我正在查看的当前项目与下一个项目进行比较来做到这一点。 If they are identical, I want to add them to a temporary list. 如果它们相同,我想将它们添加到临时列表中。 When I encounter an item that is not longer identical, I would like to print my collection of anagrams to a file and begin a new temp list to continue checking for anagrams. 当我遇到不再相同的项目时,我想将我的字谜集打印到文件中,并开始一个新的临时列表以继续检查字谜。 This is what I have thus far: 到目前为止,这是我所拥有的:

for row in results:
    for item in row:
        if <<current item is identical to next time>>:
            tempList = []   
            <<add to tempList>>
        else
            tempList[:]=[]

I'm not quite sure how to structure this such that things don't get written twice (eg cat, tab, tab, abt) or erasing things before printing them to file. 我不太确定如何构造该结构,以免在将它们打印到文件之前不会被写两次(例如,cat,tab,tab,abt)或擦除事物。

You're probably looking for something like this: 您可能正在寻找这样的东西:

from collections import defaultdict
anagrams = defaultdict(list)
for word in results:
    anagrams[word.azWord].append(word)

This is slightly different than your original implementation because in the above case, it doesn't matter if the anagrams are out of order (That is, all anagrams need not be right next to each other). 这与您的原始实现略有不同,因为在上述情况下,字词是否乱序无关紧要(也就是说,所有字词不必彼此紧靠)。

On a side note, you could probably structure your word class more efficiently like so: 附带一提,您可能会像这样更有效地构建word类:

# As a convention in python, class names are capitalized
class Word(str):
    def az(self):
        return ''.join(sorted(self))

Then you're code would look like: 然后,您的代码将如下所示:

from collections import defaultdict
anagrams = defaultdict(list)
for word in results:
    anagrams[word.az()].append(word)

To elaborate on Adam Smith's comment... you probably want something like this: 要详细阐述亚当·斯密的评论,您可能想要这样的东西:

import itertools
list_of_words.sort( key = lambda i: i.azWord )
[ list(items) for azword,items in itertools.groupby( x, lambda i: i.azWord )]

Eg. 例如。 So if you had the follow 所以,如果你有以下

x = [ x1, x2, x3, x4 ]  # where x1 & x4 have the same azWords

Then you'd get the desired grouping (sorted based on azWord): 然后,您将获得所需的分组(基于azWord排序):

[ [x1,x4], [x2], [x3] ]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM