简体   繁体   English

创建一个计算单词和字符的函数(包括标点符号,但不包括空格)

[英]Creating a function that counts words and characters (Including punctuation, but excluding white space)

I need to make a function which counts number of characters (including punctuation and excluding white space) and words in a given phrase.我需要创建一个函数来计算给定短语中的字符数(包括标点符号和不包括空格)和单词数。 I've created a function so far that can count the number of characters ,but it includes white space as well and does not count words.到目前为止,我已经创建了一个可以计算字符数的函数,但它也包括空格并且不计算单词。 How can I exclude whitespace and implement counting words as well?如何排除空格并实现计数单词?

text = " If I compare myself to someone else, then I am playing a game 
I will never win. "
def count_chars_words(txt):
    chars = len(txt.replace(' ',''))
    words = len(txt.split(' '))
    return [words,chars]

print(count_chars_words(text))


output [19, 63]

Count characters by stripping whitespaces from the text with replace(' ','') , and then getting the length of the string.通过使用replace(' ','')从文本中去除空格来计算字符数,然后获取字符串的长度。

Count words by splitting the sentence into a list of words, and checking the length of the list.通过将句子分成单词列表并检查列表的长度来计算单词。

Then, return both in a list.然后,在列表中返回两者。

text ="If I compare myself to someone else, then I am playing a game I will never win."
def count_chars_words(txt):
        chars = len(txt.replace(' ',''))
        words = len(txt.split(' '))
        return [words,chars]

print(count_chars_words(text))

Output:输出:

[17, 63]

To get an idea of what replace() and split() do:要了解replace()split()作用:

>> text.replace(' ','')
'IfIcomparemyselftosomeoneelse,thenIamplayingagameIwillneverwin.'
>> text.split(' ')
['If', 'I', 'compare', 'myself', 'to', 'someone', 'else,', 'then', 'I', 'am', 'playing', 'a', 'game', 'I', 'will', 'never', 'win.']

The function string.split() might be useful for you!函数string.split()可能对你有用! It can take a string, find every instance of whatever you feed into it (such as " " ) and split your string into a list of each set of characters separated by " " (pretty much by word).它可以接受一个字符串,找到您输入的任何内容的每个实例(例如" " ),并将您的字符串拆分为由" " (几乎按单词)分隔的每组字符的列表。 With this you should be able to continue!有了这个,你应该能够继续!

"If I compare myself to someone else, then I am playing a game I will never win.".split(" ")

gives

['If', 'I', 'compare', 'myself', 'to', 'someone', 'else,', 'then', 'I', 'am', 'playing', 'a', 'game', 'I', 'will', 'never', 'win.']

In order to avoid counting whitespace, have you considered using an if statement?为了避免计算空格,您是否考虑过使用if语句? You might findstring.whitespace and the in operator useful here!您可能会发现string.whitespacein运算符在这里很有用!

As for counting words,string.split is your friend.至于数词,string.split是你的朋友。 In fact, if you split the words up first, is there a simple way to avoid even the if referenced above?事实上,如果你先把单词分开,有没有一种简单的方法可以避免上面提到的if

This is just an idea and not the efficient way, if you need a good way to do that use regex:这只是一个想法,而不是有效的方法,如果您需要一个好的方法来做到这一点,请使用正则表达式:

text ="If I compare myself to someone else, then I am playing a game I will never win."

total_num = len(text)
spaces = len([s for s in text if s == ' '])
words = len([w for w in text.split()])

print('total characters = ', total_num)
print('words = ', words)
print('spaces=', spaces)
print('charcters w/o spaces = ', total_num - spaces)

output:输出:

total characters =  79
words =  17
spaces= 16
charcters w/o spaces =  63

Edit: using regex the more efficient will be:编辑:使用正则表达式效率更高:

import re

chars_without_spaces = re.findall(r'[^\s]', text)  # charcters w/o spaces 
words = re.findall(r'\b\w+', text)  # words

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM