简体   繁体   中英

Creating a dictionary of the words in a string, sorting by the number of occurrences, and only displaying words with 4 or more letters in them

Directions: Key of the dictionary should be word and value should be the number of times that word has appeared in the paragraph, sort the dictionary in descending order. Only display words that have 4 or more letters in them.

text = """The goal is to turn data into information and information into insight . 
You can have data without information but you cannot have information without data ."""

I've been working on this problem but can't seem to only display words that have 4 or more letters in them. Any help would be greatly appreciated, thank you.

Output is supposed to

看起来像这样

This is what I've done so far

# clean up string 'text'
for char in '-.,\n':
    text=text.replace(char,' ')
text = text.lower()
words_list = text.split(' ')

#define dictionary
words_dict = {}

#Count number of times each word comes up in list of words (in dictionary)
for word in words_list:
    if word not in words_dict:
        words_dict[word] = 0
    words_dict[word] += 1

#sort dictionary by number of occurences
sorted(words_dict.items(), key = lambda x: x[1], reverse = True)

You can use collections.Counter with filter here.

from collections import Counter
text = """The goal is to turn data into information and information into insight . 
You can have data without information but you cannot have information without data ."""

count = Counter(filter(lambda x:len(x)>=4, text.split()))

sorted(count.items(),key = lambda x:x[1],reverse=True)
[('information', 4),
 ('data', 3),
 ('into', 2),
 ('have', 2),
 ('without', 2),
 ('goal', 1),
 ('turn', 1),
 ('insight', 1),
 ('cannot', 1)]

EDIT: Without using collections . You can mimic Counter 's behaviour usingdict.setdefault

text = text.split()
new = dict()

for t in filter(lambda x:len(x)>=4,text):
    new[t] = new.setdefault(t,0) + 1

sorted(new.items(),key=lambda x:x[1],reverse=True)

Minimal Change to Posted Code

Only added a filter by length to original code

text = """The goal is to turn data into information and information into insight . 
You can have data without information but you cannot have information without data ."""

# clean up string 'text'
for char in '-.,\n':
    text=text.replace(char,' ')
text = text.lower()
words_list = text.split(' ')

#define dictionary
words_dict = {}

#Count number of times each word comes up in list of words (in dictionary)
for word in words_list:
    if word not in words_dict:
        words_dict[word] = 0
    words_dict[word] += 1

# Filter words
filtered = {word:count for word,count in words_dict.items() if len(word) >= 4}

#sort dictionary by number of occurences
result = sorted(filtered.items(), key = lambda x: x[1], reverse = True)

print('{:<15} {:<4}'.format('Word', 'Count'))
for word, count in result:
  print(f'{word:15} {count:4}')

Output

Word            Count
information        4
data               3
into               2
have               2
without            2
goal               1
turn               1
insight            1
cannot             1

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM