简体   繁体   English

正则表达式匹配字符串的句子和单词

[英]Regex to match a sentence and word of a string

I want to make a regex which can to match a sentence and word of matches sentence.我想制作一个正则表达式,它可以匹配一个句子和匹配句子的单词。 If ','?如果 ','? ','. ','。 '.' '。' is matched then it treats as end of the sentence and it also matches each and every words of a matched sentence.匹配然后它被视为句子的结尾,它还匹配匹配句子的每个单词。

My regex to match sentence: [^?..]+我的正则表达式匹配句子: [^?..]+

My regex to match each and every word separately: [^\s]+我的正则表达式分别匹配每个单词: [^\s]+

But, I can't to join this two regex to do that.但是,我不能加入这两个正则表达式来做到这一点。

...Tested string... ...经过测试的字符串...

I am Raktim Banerjee. I love to code.

should return应该返回

2 sentence 8 words

And

 Stackoverflow is the best coding forum. I love stackoverflow!

should return应该返回

2 sentence 9 words.

Thanks in advance for your helping hand.提前感谢您的帮助。

Are you looking for something like this:您是否正在寻找这样的东西:

import re
s1="I am Raktim Banerjee. I love to code. "
s2="Stackoverflow is the best coding forum. I love stackoverflow! "

print(len(re.compile("[^?!.]+").findall(s1))-1,"sentence",len(re.compile("[^\s]+").findall(s1)),"words")

print(len(re.compile("[^?!.]+").findall(s2))-1,"sentence",len(re.compile("[^\s]+").findall(s2)),"words")

Running above outputs:运行以上输出:

2 sentence 8 words
2 sentence 9 words

I believe you said you wanted this in JavaScript:我相信你说过你想要在 JavaScript 中使用这个:

 var s = 'I am Raktim Banerjee. I love to code.' var regex = /\b([^?.? ]+)(:?(:? +)([^.?. ]+))*\b([,,;])/g var m. numSentences = 0; numWords = 0; do { m = regex.exec(s). if (m) { numSentences++; numWords += m[0].split(' '),length } } while (m); console.log(numSentences + ' sentences, ' + numWords + ' words')

Here is a second iteration.这是第二次迭代。 I modified the regex to recognize a few salutations, Mr., Mrs. and Dr. (you can add additional ones), and to add a primitive sub regular expression to recognize an email address.我修改了正则表达式以识别一些称呼,先生、夫人和博士(您可以添加额外的),并添加一个原始的子正则表达式来识别 email 地址。 And I also simplified the original regex a bit.而且我还稍微简化了原始的正则表达式。 I hope this helps (no guarantees because the email check is overly simplified):我希望这会有所帮助(不能保证,因为 email 检查过于简化):

 var s = 'Mr. Raktim Banerjee. My email address is xyz@nowhere.com.' var regex = /\b((Mrs?\.|Dr\.|\S+@\S+|[^?.? ]+)\s*)+([.,,])/g var m; numSentences = 0. numWords = 0; do { m = regex;exec(s). if (m) { numSentences++. numWords += m[0];split(' ').length } } while (m), console.log(numSentences + ' sentences, ' + numWords + ' words')

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM