简体   繁体   English

使用数组字典获取数组中最长元素的数组

[英]Getting array of the longest elements of an array using dictionary of arrays

I'm trying to write a function that returns an array of the elements of the longest length. 我正在尝试编写一个函数,该函数返回最长元素的数组。 I'm not looking for the longest element , but the longest element s . 我不是在寻找最长的元素 ,而是在寻找最长的元素s

The approach I've taken is to create a dictionary of arrays where the key is the length and the value is an array of elements of the length indicated by the key. 我采用的方法是创建一个数组字典,其中键是长度,值是键所指示的长度的元素数组。

This is the code I've come up with 这是我想出的代码

#initialise the dictionary
longest = {}
#this keeps track of the greatest length
longestNum = 0
for seq in proteinSeq:
    if len(seq) >= longestNum:
        longestNum = len(seq)
        #check to see if the dic key exists
        #if not initialise it
        try:
            longest[longestNum].append(seq)
        except NameError:
            longest[longestNum] = []
            longest[longestNum].append(seq)

return longest[longestNum]

It gives me a KeyError: 6 at the first longest[longestNum].append(seq) ... 它给了我一个KeyError: 6第一个longest[longestNum].append(seq) ...

Can someone help me find what the problem here is? 有人可以帮我找到这里的问题吗?

If you try to read a key that doesn't exist, you get a KeyError , not a NameError , as your error message says. 如果您尝试读取一个不存在的密钥,则会收到KeyError而不是NameError ,如错误消息所述。 So you're catching the wrong exception. 因此,您正在捕获错误的异常。

You could use 你可以用

except KeyError:

but I might use 但我可能会用

longest.setdefault(longestNum, []).append(seq)

instead, or make longest a collections.defaultdict(list) , in which case it would simply be 相反,或使longestcollections.defaultdict(list) ,在这种情况下,它将是

longest[longestNum].append(seq).

See this article for a quick comparison of defaultdict vs setdefault. 请参阅本文 ,以快速比较defaultdict与setdefault。

Change the NameError to KeyError , because if the key does not exist in your dictionary, a KeyError is raised, as you have seen in the traceback. NameError更改为KeyError ,因为如果字典中不存在该键,则会引发一个KeyError ,就像在回溯中看到的那样。

However, I'm not sure you need a dictionary in this case. 但是,在这种情况下,我不确定您是否需要字典。 What about something like: 怎么样呢?

longestwords=[]
longestlength=0

for word in all_words:

    if len(word) > longestlength:
         longestwords=[word,]
         longestlength=len(word)
    elif len(word) == longestlength:
         longestwords.append(word)

Here's a shorter and more declarative version, assuming I've understood your question properly. 假设我已正确理解您的问题,这是一个简短而更具说明性的版本。 It also has the advantage of not constructing an entire dictionary only to subsequently discard all the key-value pairs corresponding to sequences shorter than those you are interested in. 它还具有不构建整个字典的优点,而只是不随后丢弃与比您感兴趣的序列短的序列相对应的所有键值对。

>>> from itertools import takewhile
>>> # sort the protein sequences by length and then reverse the new
>>> # list so that the longest sequences come first.    
>>> longest_first = sorted(proteinSeq, key=len, reverse=True) 
>>> longestNum = len(longest_first[0])
>>> # take only those sequences whose length is equal to longestNum
>>> seqs = list(takewhile(lambda x: len(x)==longestNum, longest_first))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM