简体   繁体   English

Python,带字典,并生成包含(单词> 1,最常用的单词,最长的单词)的列表

[英]Python, Take dictionary, and produce list with (words>1, most common words, longest words)

So i made a function 所以我做了一个功能

def word_count(string):
    my_string = string.lower().split()
    my_dict = {}
    for item in my_string:
        if item in my_dict:
            my_dict[item] += 1
        else:
            my_dict[item] = 1
    print(my_dict)

so, what this does is that it takes a string, splits it, and produces a dictionary with the key being the word, and the value being how many times it appears. 因此,这样做是将一个字符串取下来,将其拆分,然后生成一个字典,其中的键是单词,值是它出现的次数。

Okay, so what im trying to do now, is to make a function that takes the output of that function, and produces a list in the following format- 好的,所以我现在想做的就是制作一个函数,该函数接收该函数的输出,并以以下格式生成列表:

((list of words longer than 1 letter),(list of most frequent words), (list of words with the longest length)) ((长于1个字母的单词列表),(最常见的单词列表),(最长的单词列表))

also, for example lets say two words have appeared 3 times, and both words are 6 letters long, it should include both words in both the (most frequent) and (longest length) lists. 同样,例如,假设两个单词出现了3次,并且两个单词的长度均为6个字母,则应在(最频繁)列表和(最长长度)列表中同时包含两个单词。

So, this has been my attempt thus far at tackling this problem 因此,到目前为止,这是我尝试解决此问题的尝试

def analyze(x):
    longer_than_one= []
    most_frequent= []
    longest= []
    for key in x.item:
        if len(key) >1:
            key.append(longer_than_one)
    print(longer_than_one)

so what i was trying to do here, is make a series of for and if loops, that append to the lists depending on whether or not the items meet the criteria, however i have run into the following problems:- 所以我在这里试图做的是一系列的for和if循环,根据项目是否符合条件附加到列表中,但是我遇到了以下问题:

1- how do i iterate over a dictionary without getting an error? 1-如何遍历字典而不会出错?

2- I cant figure out a way to count the most frequent words (i was thinking to append the keys with the highest values) 2-我想不出一种方法来计算最常用的单词(我当时想添加最高值的键)

3- I cant figure out a way to only append the words that are the longest in the dictionary (i was thinking of using len(key) but it said error) 3-我想不出一种方法来仅追加词典中最长的单词(我当时想使用len(key),但它表示错误)

If it's any help, im working in Anaconda's Spyder using Python 3.5.1 ,any tips would be appreciated! 如果有帮助,请使用Python 3.5.1在Anaconda的Spyder中运行,任何提示将不胜感激!

You really are trying to re-invent the wheel. 您确实是在尝试重新发明轮子。

Imagine you have list_of_words which is, well, a list of strings. 假设您有list_of_words ,这是一个字符串列表。

To get the most frequent word, use Counter : 要获得最常用的单词,请使用Counter

from collections import Counter
my_counter = Counter(list_of_words)

To sort the list by the length: 要按长度对列表进行排序:

sorted_by_length = sorted(list_of_words, key=len)

To get the list of words longer than one letter you can simply use your sorted list, or create a new list with only these: 要获得长于一个字母的单词列表,您可以简单地使用排序列表,或仅使用以下列表创建新列表:

longer_than_one_letter = [word for word in list_of_words if len(word) > 1]

To get your output on your required format, simply use all of the above. 要获得所需格式的输出,只需使用以上所有内容。

Most of your problems are solved or get easier when you use a Counter . 当您使用Counter时,大多数问题都可以解决或变得更容易。

Writing word_count with a Counter : 用一个Counterword_count

>>> from collections import Counter
>>> def word_count(string):
...     return Counter(string.split())

Demo: 演示:

>>> c = word_count('aa aa aa xxx xxx xxx b b ccccccc')
>>> c
Counter({'aa': 3, 'xxx': 3, 'b': 2, 'ccccccc': 1})
>>> c['aa']
3

The most_common method of a Counter helps with getting the most frequent words: Countermost_common方法有助于获取最常用的单词:

>>> c.most_common()
[('aa', 3), ('xxx', 3), ('b', 2), ('ccccccc', 1)]
>>> c.most_common(1)
[('aa', 3)]
>>> max_count = c.most_common(1)[0][1]
>>> [word for word, count in c.items() if count == max_count]
['aa', 'xxx']

You can get the words themselves with c.keys() 您可以使用c.keys()获取单词本身

>>> c.keys()
['aa', 'xxx', 'b', 'ccccccc']

and a list of words with the longest length this way: 以及这样最长的单词列表:

>>> max_len = len(max(c, key=len))
>>> [word for word in c if len(word) == max_len]
['ccccccc']

1) To iterate over dictionary you can either use: 1)要遍历字典,您可以使用:

for key in my_dict:

or if you want to get key and value at the same time use: 或者,如果您想同时获取键和值,请使用:

for key, value in my_dict.iteritems():

2) To find most frequent words you have to assume that first word is most frequent, then you look at next word used count and if it's the same you append it to your list, if it's less just skip it, if it's more - clear you list and assume that this one is most frequent 2)要查找最常用的单词,您必须假设第一个单词最常用,然后查看下一个单词的使用计数,如果与之相同,则将其附加到列表中,如果较少,则跳过它,如果更多,则清除-清除您列出并假设此频率最高

3) Pretty much the same as 2. Assume that your first is longest the compare if next one, if it's lenght equals to your current max just append to a list, if it's less skip it, if it's more clear your list and assume that this is your max. 3)与2大致相同。假设您的第一个是最长的比较(如果下一个比较),如果长度等于您当前的最大值,则仅追加到列表中;如果跳过的次数较少,则更清楚您的列表并假设这是您的最高

I didn't add any code since it's better if you write it your own in order to learn something 我没有添加任何代码,因为最好是自己编写以学习一些东西,

There are other nice answers for your question, But I would like to help you in your attempt, I have done few modification in your code to make it working- 您的问题还有其他不错的答案,但是,我想为您提供帮助,我对您的代码进行了一些修改,以使其正常运行-

def analyze(x):
        longer_than_one= []
        most_frequent= []
        longest= []
        for key in x:
            if len(key) >1:
                longer_than_one.append(key)
        print(longer_than_one)

It seems you haven't attempted for 2nd and 3rd use case. 看来您没有尝试第二种和第三种用例。

At first, check collections.Counter : 首先,检查collections.Counter

import collections

word_counts = collections.Counter(your_text.split())

Given that, you can use its .most_common method for the most common words. 鉴于此,您可以将其.most_common方法用于最常用的单词。 It produces a list of (word, its_count) tuples. 它产生一个(单词,its_count)元组的列表。

To discover the longest words in the dictionary, you can do: 要发现字典中最长的单词,您可以执行以下操作:

import heapq

largest_words= heapq.nlargest(N, word_counts, key=len)

N being the count of largest words you want. N是您想要的最大单词数。 This works because by default the iteration over a dict produces only the keys, so it sorts them according to the word length ( key=len ) and returns only the N largest ones. 之所以可行,是因为默认情况下,对dict的迭代仅生成键,因此它将根据单词长度( key=len )对它们进行排序,并且仅返回N个最大的键。

But you seem to have fallen deep into Python without going over the tutorial. 但是,您似乎不了解本教程而已深入Python。 Is it homework? 是功课吗?

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM