简体   繁体   中英

Getting array of the longest elements of an array using dictionary of arrays

I'm trying to write a function that returns an array of the elements of the longest length. I'm not looking for the longest element , but the longest element s .

The approach I've taken is to create a dictionary of arrays where the key is the length and the value is an array of elements of the length indicated by the key.

This is the code I've come up with

#initialise the dictionary
longest = {}
#this keeps track of the greatest length
longestNum = 0
for seq in proteinSeq:
    if len(seq) >= longestNum:
        longestNum = len(seq)
        #check to see if the dic key exists
        #if not initialise it
        try:
            longest[longestNum].append(seq)
        except NameError:
            longest[longestNum] = []
            longest[longestNum].append(seq)

return longest[longestNum]

It gives me a KeyError: 6 at the first longest[longestNum].append(seq) ...

Can someone help me find what the problem here is?

If you try to read a key that doesn't exist, you get a KeyError , not a NameError , as your error message says. So you're catching the wrong exception.

You could use

except KeyError:

but I might use

longest.setdefault(longestNum, []).append(seq)

instead, or make longest a collections.defaultdict(list) , in which case it would simply be

longest[longestNum].append(seq).

See this article for a quick comparison of defaultdict vs setdefault.

Change the NameError to KeyError , because if the key does not exist in your dictionary, a KeyError is raised, as you have seen in the traceback.

However, I'm not sure you need a dictionary in this case. What about something like:

longestwords=[]
longestlength=0

for word in all_words:

    if len(word) > longestlength:
         longestwords=[word,]
         longestlength=len(word)
    elif len(word) == longestlength:
         longestwords.append(word)

Here's a shorter and more declarative version, assuming I've understood your question properly. It also has the advantage of not constructing an entire dictionary only to subsequently discard all the key-value pairs corresponding to sequences shorter than those you are interested in.

>>> from itertools import takewhile
>>> # sort the protein sequences by length and then reverse the new
>>> # list so that the longest sequences come first.    
>>> longest_first = sorted(proteinSeq, key=len, reverse=True) 
>>> longestNum = len(longest_first[0])
>>> # take only those sequences whose length is equal to longestNum
>>> seqs = list(takewhile(lambda x: len(x)==longestNum, longest_first))

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM