简体   繁体   中英

Counting words in a string using regex

I'm trying to count the occurrence of each word in a long string of text after filtering out some characters such as "!"#$%&'()*,-./:;?@[]_" and I did that using regex.

And reformating the output to be separated with a tab

so I faced two problems:

1- using return in the function only produced the first line yet when substituted with the print function it worked fine. Yet I fear using print instead of return.

2- on the TMC server, it returns an error that I don't understand.

Error:-

Failed: test.test_word_frequencies.WordFrequencies.test_first
        'NoneType' object is not subscriptable

Test results: 1/2 tests passed
 50%[????????????????????????????????????????????????????????????????]

Here is my progress so far:-

def word_frequencies(file):
    import re
    from collections import Counter

    counts = []
    inputfile = open(file,"r")
    textfile = inputfile.read()
    pattern = re.compile(r'([\!\"\#\$\%\&\'\(\)\*\,\-\\.\\\/\:\;\?\@\[\]\_])')
    file_clean = re.sub(pattern,"",textfile)
    words = file_clean.split()
    for word in words:
        counts.append(Counter(words)[f'{word}'])
    for word,count in zip(words,counts):
        print(f"{word}\t{count}")

I'm not sure if using a return function at the last for loop was wise as it returns:-

'The\t64'

Instead of:-

The 64
Project 83
Gutenberg   27
EBook   3
of  303
Alice   166
etc...

I'm not sure where this error is coming from.

Return statement is returning what you are appending to the list.

In your example,

for word,count in zip(words,counts): 
    result.append(f"{word}\t{count}")
print(result)

It's the print statement, you are printing list instead of element itself. Try below

for data in result:
    print(data)

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM