简体   繁体   English

在Python的字符串/列表中获取第三大单词

[英]Getting the third largest word in a string/list in python

For the code below i can't seem to get the 3rd largest word. 对于下面的代码,我似乎无法获得第三大单词。 I am splitting the string i get from user input and putting it in the "words" var, then i make 2 lists - one of which includes the words sorted in terms of length. 我将我从用户输入中得到的字符串分割开,并将其放入“ words”变量中,然后列出2个列表-其中之一包括按长度排序的单词。

Then i get the length of the longest word (in maxlist ) and second longest word (in maxlist2 ) and remove them. 然后我得到最长的单词(在maxlist )和第二个最长的单词(在maxlist2 )的maxlist2并将其删除。 All that's left should be the third longest word from the original list and any shorter words. 剩下的应该是原始列表中的第三长单词和任何短单词。 But i find it doesn't quite work right. 但我发现它不太正常。

The second and third "for" statements below don't seem to remove all instances of wordlength represented by "maxlist" 第二和第三个"for"的语句下面似乎并没有删除的所有实例wordlength表示为"maxlist"

For example, if i represent words by just the letter "e" and use different numbers of e's for different wordlength (ie. ee, eeee, eeeee) some of these instances are removed by the "for" statement and some are not. 例如,如果我仅用字母“ e”表示单词,并针对不同的单词长度使用不同数量的e(即ee,eeee,eeeee),则这些实例中的某些实例会被“ for”语句删除,而有些则不会。 For this input: "e ee eee eeee eeee eeee eeee eeee eeee eeee eeee" i should expect all "eeee" words to be removed by the code: 对于此输入: "e ee eee eeee eeee eeee eeee eeee eeee eeee eeee"我希望所有“ eeee”字样都可以通过以下代码删除:

 if len(word) == maxlist:
            sort2.remove(word)

If i repeat the code again for the next longest word (which is done by the third "for" statement) i should also remove the "eee" instance. 如果我再次对下一个最长的单词重复该代码(由第三个“ for”语句完成),我也应该删除“ eee”实例。 They are not removed though, and the final list remains "'e', 'ee', 'eee', 'eeee', 'eeee'" 不过,它们不会被删除,最终列表仍为"'e', 'ee', 'eee', 'eeee', 'eeee'"

The second "for" statement seems to remove 6 instances of "eeee" but not all 8 instances. 第二个"for"语句似乎删除了6个"eeee"实例,但不是全部8个实例。 What is wrong here? 怎么了 Please help!! 请帮忙!!

My final output should be the third longest word of the original list + any shorter words. 我的最终输出应该是原始列表中的第三长单词+任何短单词。

def ThirdGreatest(strArr):

    words = strArr.split()
    sort=[] # length of words
    sort2=[] # actual words
    for word in words:
        sort2.append(word)
        sort.append(len(word))
        sort2.sort()

    maxlist= len(max(sort2, key=len)) 
    for word in sort2:
        if len(word) == maxlist:
            sort2.remove(word)

    maxlist2 = len(max(sort2, key=len))
    for word in sort2:
        if len(word) == maxlist2:
            sort2.remove(word)

    maxlist3 = (max(sort2, key=len))

    print 
    print "biggest word is {} char long ".format(maxlist) 
    print sort
    print "3rd biggest word is {}: ".format(maxlist3)
    print "3rd biggest word is {}: ".format(sort2) # list of words remaining       
    #after the first 2 longest have been removed


ThirdGreatest(raw_input("Enter String: ")) 

You should use heapq for finding the third largest: 您应该使用heapq查找第三大:

third_largest = heapq.nlargest(3, set(words))[-1]

After that, you can use all sorts of stuff, eg list comprehension: 在那之后,您可以使用各种各样的东西,例如列表理解:

[word for word in words if word != third_largest]

Your problem is: 您的问题是:

for word in sort2:
    if len(word) == maxlist:
        sort2.remove(word)

Don't change the list you're currently iterating, that's just gonna mess things up. 不要更改您当前正在迭代的列表,这只会使事情变得混乱。 It's like you're reading a book and someone rips out pages while you're reading. 就像您在读书,有人在阅读时翻页一样。

Iterate over a copy instead: 而是遍历一个副本:

for word in sort2[:]:
    if len(word) == maxlist:
        sort2.remove(word)

Note the added [:] , which gives you a copy. 请注意添加的[:] ,它会为您提供一份副本。


And an alternative solution : 还有一个替代解决方案

[next(g) for _, g in groupby(sorted(words, key=len), len)][-3]

Demo: 演示:

>>> words = 'This is a test and I try hard to make it good'.split()
>>> from itertools import groupby
>>> [next(g) for _, g in groupby(sorted(words, key=len), len)][-3]
'is'

This was my original, long and cumbersome solution but other users have posted much more concise and clear answers. 这是我最初的,冗长且麻烦的解决方案,但其他用户却发布了更为简洁明了的答案。 Thank you guys. 感谢大伙们。

Def ThirdGreatest(strArr): words = strArr.split() sort=[] # length of words sort2=[] # actual words Def ThirdGreatest(strArr):单词= strArr.split()sort = []#单词长度sort2 = []#实际单词

for word in words:
    sort2.append(word)
    sort.append(len(word))

sort3=set(sort2)
sorted_set3= sorted(sort3, key=len)
sort4 =[]
for n in sorted_set3:
    sort4.append(n)

maxlist= len(max(sort4, key=len)) 
for word in sort4:
    if len(word) == maxlist:
        sort4.remove(word)

maxlist2 = len(max(sort4, key=len))
for word in sort4:
    if len(word) == maxlist2:
        sort4.remove(word)
print "_______________________________________________________________"
print "The third largest word is: {} ".format(max(sort4, key=len))

ThirdGreatest(raw_input("Enter String: ")) ThirdGreatest(raw_input(“ Enter String:”))

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM