简体   繁体   中英

Python, Take dictionary, and produce list with (words>1, most common words, longest words)

So i made a function

def word_count(string):
    my_string = string.lower().split()
    my_dict = {}
    for item in my_string:
        if item in my_dict:
            my_dict[item] += 1
        else:
            my_dict[item] = 1
    print(my_dict)

so, what this does is that it takes a string, splits it, and produces a dictionary with the key being the word, and the value being how many times it appears.

Okay, so what im trying to do now, is to make a function that takes the output of that function, and produces a list in the following format-

((list of words longer than 1 letter),(list of most frequent words), (list of words with the longest length))

also, for example lets say two words have appeared 3 times, and both words are 6 letters long, it should include both words in both the (most frequent) and (longest length) lists.

So, this has been my attempt thus far at tackling this problem

def analyze(x):
    longer_than_one= []
    most_frequent= []
    longest= []
    for key in x.item:
        if len(key) >1:
            key.append(longer_than_one)
    print(longer_than_one)

so what i was trying to do here, is make a series of for and if loops, that append to the lists depending on whether or not the items meet the criteria, however i have run into the following problems:-

1- how do i iterate over a dictionary without getting an error?

2- I cant figure out a way to count the most frequent words (i was thinking to append the keys with the highest values)

3- I cant figure out a way to only append the words that are the longest in the dictionary (i was thinking of using len(key) but it said error)

If it's any help, im working in Anaconda's Spyder using Python 3.5.1 ,any tips would be appreciated!

You really are trying to re-invent the wheel.

Imagine you have list_of_words which is, well, a list of strings.

To get the most frequent word, use Counter :

from collections import Counter
my_counter = Counter(list_of_words)

To sort the list by the length:

sorted_by_length = sorted(list_of_words, key=len)

To get the list of words longer than one letter you can simply use your sorted list, or create a new list with only these:

longer_than_one_letter = [word for word in list_of_words if len(word) > 1]

To get your output on your required format, simply use all of the above.

Most of your problems are solved or get easier when you use a Counter .

Writing word_count with a Counter :

>>> from collections import Counter
>>> def word_count(string):
...     return Counter(string.split())

Demo:

>>> c = word_count('aa aa aa xxx xxx xxx b b ccccccc')
>>> c
Counter({'aa': 3, 'xxx': 3, 'b': 2, 'ccccccc': 1})
>>> c['aa']
3

The most_common method of a Counter helps with getting the most frequent words:

>>> c.most_common()
[('aa', 3), ('xxx', 3), ('b', 2), ('ccccccc', 1)]
>>> c.most_common(1)
[('aa', 3)]
>>> max_count = c.most_common(1)[0][1]
>>> [word for word, count in c.items() if count == max_count]
['aa', 'xxx']

You can get the words themselves with c.keys()

>>> c.keys()
['aa', 'xxx', 'b', 'ccccccc']

and a list of words with the longest length this way:

>>> max_len = len(max(c, key=len))
>>> [word for word in c if len(word) == max_len]
['ccccccc']

1) To iterate over dictionary you can either use:

for key in my_dict:

or if you want to get key and value at the same time use:

for key, value in my_dict.iteritems():

2) To find most frequent words you have to assume that first word is most frequent, then you look at next word used count and if it's the same you append it to your list, if it's less just skip it, if it's more - clear you list and assume that this one is most frequent

3) Pretty much the same as 2. Assume that your first is longest the compare if next one, if it's lenght equals to your current max just append to a list, if it's less skip it, if it's more clear your list and assume that this is your max.

I didn't add any code since it's better if you write it your own in order to learn something

There are other nice answers for your question, But I would like to help you in your attempt, I have done few modification in your code to make it working-

def analyze(x):
        longer_than_one= []
        most_frequent= []
        longest= []
        for key in x:
            if len(key) >1:
                longer_than_one.append(key)
        print(longer_than_one)

It seems you haven't attempted for 2nd and 3rd use case.

At first, check collections.Counter :

import collections

word_counts = collections.Counter(your_text.split())

Given that, you can use its .most_common method for the most common words. It produces a list of (word, its_count) tuples.

To discover the longest words in the dictionary, you can do:

import heapq

largest_words= heapq.nlargest(N, word_counts, key=len)

N being the count of largest words you want. This works because by default the iteration over a dict produces only the keys, so it sorts them according to the word length ( key=len ) and returns only the N largest ones.

But you seem to have fallen deep into Python without going over the tutorial. Is it homework?

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM