Counting words in a string using regex

Question

I'm trying to count the occurrence of each word in a long string of text after filtering out some characters such as "!"#$%&'()*,-./:;?@[]_" and I did that using regex.

And reformating the output to be separated with a tab

so I faced two problems:

1- using return in the function only produced the first line yet when substituted with the print function it worked fine. Yet I fear using print instead of return.

2- on the TMC server, it returns an error that I don't understand.

Error:-

Failed: test.test_word_frequencies.WordFrequencies.test_first
        'NoneType' object is not subscriptable

Test results: 1/2 tests passed
 50%[????????????????????????????????????????????????????????????????]

Here is my progress so far:-

def word_frequencies(file):
    import re
    from collections import Counter

    counts = []
    inputfile = open(file,"r")
    textfile = inputfile.read()
    pattern = re.compile(r'([\!\"\#\$\%\&\'\(\)\*\,\-\\.\\\/\:\;\?\@\[\]\_])')
    file_clean = re.sub(pattern,"",textfile)
    words = file_clean.split()
    for word in words:
        counts.append(Counter(words)[f'{word}'])
    for word,count in zip(words,counts):
        print(f"{word}\t{count}")

I'm not sure if using a return function at the last for loop was wise as it returns:-

'The\t64'

Instead of:-

The 64
Project 83
Gutenberg   27
EBook   3
of  303
Alice   166
etc...

I'm not sure where this error is coming from.

Answer 1

Return statement is returning what you are appending to the list.

In your example,

for word,count in zip(words,counts): 
    result.append(f"{word}\t{count}")
print(result)

It's the print statement, you are printing list instead of element itself. Try below

for data in result:
    print(data)

Counting words in a string using regex

Question

1 answers

solution1
0 2019-12-17 14:52:02

Counting words in a string using regex

Question

1 answers

solution1 0 2019-12-17 14:52:02

solution1
0 2019-12-17 14:52:02