简体   繁体   English

计算Python标点符号之间的单词数

[英]Counting number of words between punctuation characters in Python

I want to use Python to count the numbers of words that occur between certain punctuation characters in a block of text input. 我想使用Python来计算文本输入块中某些标点符号之间出现的单词数。 For example, such an analysis of everything written up to this point might be represented as: 例如,对到目前为止编写的所有内容的这种分析可以表示为:

[23, 2, 14] [23,2,14]

...because the first sentence, which has no punctuation except the period at the end, has 23 words, the "For example" phrase that comes next has two, and the rest, ending with the colon, has 14. ...因为第一个句子除了结尾的句点外没有其他标点符号,所以有23个单词,接下来出现的“例如”短语有两个单词,其余以冒号结尾的单词有14个单词。

This probably wouldn't be too hard to make, but (to go along with the "don't reinvent the wheel" philosophy that seems especially Pythonic) is there anything already out there that would be especially suitable for the task? 这样做可能并不难,但是(与似乎没有Pythonic的“不要重新发明轮子”哲学相伴随)是否已经有特别适合该任务的内容?

punctuation_i_care_about="?.!"
split_by_punc =  re.split("[%s]"%punctuation_i_care_about, some_big_block_of_text)
words_by_puct = [len(x.split()) for x in split_by_punc]

Joran beat me to it, but I'll add my approach: Joran击败了我,但我将添加自己的方法:

from string import punctuation
import re

s = 'I want to use Python to count the numbers of words that occur between certain punctuation characters in a block of text input. For example, such an analysis of everything written up to this point might be represented as'

gen = (x.split() for x in re.split('[' + punctuation + ']',s))

list(map(len,gen))
Out[32]: [23, 2, 14]

(I love map ) (我爱map

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM