简体   繁体   中英

How do I make my code differentiate between words and singular characters? (Python)

(Python) My task is to create a program that gathers an input() and puts it into a dictionary. For each word of the text it counts the number of its occurrences before it. My code:

text = input()

words = {}

for word in text:
    if word not in words:
        words[word] = 0
        print(words[word])

    elif word in words:
        words[word] = words[word] + 1
        print(words[word])

An example input could be:

one two one two three two four three

The correct output should be:

0
0
1
1
0
2
0
1

My code however counts the occurrence of every character, instead of every word making the output way too long. How do I make it differentiate between word and character?

That is because text is a string and iterating over a string iterates through characters. You can use for word in text.split() , this will split the string into a list. By default, it does the split on whitespaces, so it will split it into a list of words here.

Given your example input, you would need to split text on whitespace in order to get words. In general, the problem of splitting arbitrary text into words/tokens is non-trivial; there are a lot of natural language processing libraries purpose built for this.

Also, for counting things, the Counter class from the built-in collections module is very useful.

from collections import Counter

text = input()
word_counts = Counter(w for w in text.split())
print(word_counts.most_common())

Output

[('two', 3), ('one', 2), ('three', 2), ('four', 1)]

You are looking for the function split from the String type: https://docs.python.org/3/library/stdtypes.html?highlight=str%20split#str.split

Use it to create an array of words:

splitted_text = text.split()

The full example will look like:

text = 'this is an example and this is nice'

splitted_text = text.split()

words = {}

for word in splitted_text:
   if word not in words:
      words[word] = 0
    
   elif word in words:
      words[word] = words[word] + 1
print(words)

Which will output:

{'this': 1, 'is': 1, 'an': 0, 'example': 0, 'and': 0, 'nice': 0}

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM