简体   繁体   中英

top 10 most frequent wordlengths in a list of words

I am writing a function that returns the top 10 most frequent word lengths in a file called wordlist.txt that contains all words starting from a to z. I have wrote a function (named 'value_length') that returns a list of each word's length inside a certain list. I also applied the Counter module in a dictionary (that has the lengths of word as keys, frequency of those lengths as values) to solve the problem.

from collections import Counter

def value_length(seq):
    '''This function takes a sequence and returns a list that contains 
    the length of each element
    '''
    value_l = []
    for i in range(len(seq)):
        length = len(seq[i])
        value_l.append(length)
    print(value_l) 

# open the txt file 
fileobj = open("wordlist.txt", "r")
file_content = []

# create a list with length of every single word   
for line in fileobj:
    file_content.append(line)
    wordlist_lengths = value_length(file_content)

# create a dictionary that has the number of occurrence of each length as key
occurrence = {x:file_content.count(x) for x in file_content}
c = Counter(occurrence)
c.most_common(10)  

But whenever I run this code, I do not get the result I desired; I only get the outcome from the value_length function (ie an extremely long list that has the length of each word). In other words, Python does not interpret the dictionary. I do not understand what my mistake is.

There's no need to store the lengths in a list, or to use the list's count method; you've imported Counter already, so just use that to do the counting.

c = Counter()
for word in seq:
    length = len(word)
    c[length] += 1

This code will find the lengths of each list item and sort them. Then you can simply make a tuple out of the occurance + count of occurance in list:

words = ["Hi", "bye", "hello", "what", "no", "crazy", "why", "say", "imaginary"]

lengths = [len(w) for w in words]
print(lengths)
sortedLengths = sorted(lengths)
print(sortedLengths)

countedLengths = [(w, sortedLengths.count(w)) for w in sortedLengths]
print(countedLengths)

This prints:

[2, 3, 5, 4, 2, 5, 3, 3, 9]
[2, 2, 3, 3, 3, 4, 5, 5, 9]
[(2, 2), (2, 2), (3, 3), (3, 3), (3, 3), (4, 1), (5, 2), (5, 2), (9, 1)]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM