How do I make my code differentiate between words and singular characters? (Python)

Question

(Python) My task is to create a program that gathers an input() and puts it into a dictionary. For each word of the text it counts the number of its occurrences before it. My code:

text = input()

words = {}

for word in text:
    if word not in words:
        words[word] = 0
        print(words[word])

    elif word in words:
        words[word] = words[word] + 1
        print(words[word])

An example input could be:

one two one two three two four three

The correct output should be:

My code however counts the occurrence of every character, instead of every word making the output way too long. How do I make it differentiate between word and character?

Answer 1

That is because text is a string and iterating over a string iterates through characters. You can use for word in text.split() , this will split the string into a list. By default, it does the split on whitespaces, so it will split it into a list of words here.

Answer 2

Given your example input, you would need to split text on whitespace in order to get words. In general, the problem of splitting arbitrary text into words/tokens is non-trivial; there are a lot of natural language processing libraries purpose built for this.

Also, for counting things, the Counter class from the built-in collections module is very useful.

from collections import Counter

text = input()
word_counts = Counter(w for w in text.split())
print(word_counts.most_common())

Output

[('two', 3), ('one', 2), ('three', 2), ('four', 1)]

Answer 3

You are looking for the function split from the String type: https://docs.python.org/3/library/stdtypes.html?highlight=str%20split#str.split

Use it to create an array of words:

splitted_text = text.split()

The full example will look like:

text = 'this is an example and this is nice'

splitted_text = text.split()

words = {}

for word in splitted_text:
   if word not in words:
      words[word] = 0
    
   elif word in words:
      words[word] = words[word] + 1
print(words)

Which will output:

{'this': 1, 'is': 1, 'an': 0, 'example': 0, 'and': 0, 'nice': 0}

How do I make my code differentiate between words and singular characters? (Python)

Question

3 answers

solution1
1 2021-10-05 15:07:34

solution2
1 2021-10-05 15:10:15

solution3
0 ACCPTED 2021-10-05 15:14:17

How do I make my code differentiate between words and singular characters? (Python)

Question

3 answers

solution1 1 2021-10-05 15:07:34

solution2 1 2021-10-05 15:10:15

solution3 0 ACCPTED 2021-10-05 15:14:17

solution1
1 2021-10-05 15:07:34

solution2
1 2021-10-05 15:10:15

solution3
0 ACCPTED 2021-10-05 15:14:17