简体   繁体   中英

.split python word count

I need to count the words in a sentence. For example, "I walk my dog." Would be 4 words, but "I walk my 3 dogs" would only be 4 words because numbers are not words. The code can only count alphabetic words. I understand how to count words by simply using the following:

len(string.split)

but this doesn't account for numbers. Is there a simply way (for a beginner) to account for numbers, symbols, etc? thank you.

totalWords = sum(1 for word in line.split() if word.isalpha())

You can use split function on the line to split it based on spaces. And then check if each word has only alphabets using isalpha function. If it is true, then include 1. Sum all of them at the end.

Here's another option:

import re

lines = [
    'I walk by dog',
    'I walk my 3 dogs',
    'I walk my Beagle-Harrier' # DSM's example
]

for line in lines:
    words = re.findall('[a-z-]+', line, flags=re.I)
    print line, '->', len(words), words

# I walk by dog -> 4 ['I', 'walk', 'by', 'dog']
# I walk my 3 dogs -> 4 ['I', 'walk', 'my', 'dogs']
# I walk my Beagle-Harrier -> 4 ['I', 'walk', 'my', 'Beagle-Harrier']

您可以在字符串上使用.isalpha()

len([word for word in sentence.split() if word.isalpha()])

If you don't want to use .isalpha

sum(not word.isdigit() for word in line.split())

This will return True for each word that is not a number, and False for each word that is a number. This code takes advantage of the fact that in python, True == 1 and False == 0 , so you will get the number of non-number words.


If you are uncomfortable with using the int -ness of bool s, you can make it explicit to the reader of your code by adding the int function (this is 100% not needed, but can make the code clearer if you like it that way)

sum(int(not word.isdigit()) for word in line.split())

Since, due to comments it looks like he wants something that doesn't use .isalpha, we could run this in a try/except.

count = 0
for word in line.split():
    try:
        int(word)
    except ValueError:
        count += 1

I know it's not pretty, but it handles it correctly.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM