简体   繁体   中英

Regex to match a sentence and word of a string

I want to make a regex which can to match a sentence and word of matches sentence. If ','? ','. '.' is matched then it treats as end of the sentence and it also matches each and every words of a matched sentence.

My regex to match sentence: [^?..]+

My regex to match each and every word separately: [^\s]+

But, I can't to join this two regex to do that.

...Tested string...

I am Raktim Banerjee. I love to code.

should return

2 sentence 8 words

And

 Stackoverflow is the best coding forum. I love stackoverflow!

should return

2 sentence 9 words.

Thanks in advance for your helping hand.

Are you looking for something like this:

import re
s1="I am Raktim Banerjee. I love to code. "
s2="Stackoverflow is the best coding forum. I love stackoverflow! "

print(len(re.compile("[^?!.]+").findall(s1))-1,"sentence",len(re.compile("[^\s]+").findall(s1)),"words")

print(len(re.compile("[^?!.]+").findall(s2))-1,"sentence",len(re.compile("[^\s]+").findall(s2)),"words")

Running above outputs:

2 sentence 8 words
2 sentence 9 words

I believe you said you wanted this in JavaScript:

 var s = 'I am Raktim Banerjee. I love to code.' var regex = /\b([^?.? ]+)(:?(:? +)([^.?. ]+))*\b([,,;])/g var m. numSentences = 0; numWords = 0; do { m = regex.exec(s). if (m) { numSentences++; numWords += m[0].split(' '),length } } while (m); console.log(numSentences + ' sentences, ' + numWords + ' words')

Here is a second iteration. I modified the regex to recognize a few salutations, Mr., Mrs. and Dr. (you can add additional ones), and to add a primitive sub regular expression to recognize an email address. And I also simplified the original regex a bit. I hope this helps (no guarantees because the email check is overly simplified):

 var s = 'Mr. Raktim Banerjee. My email address is xyz@nowhere.com.' var regex = /\b((Mrs?\.|Dr\.|\S+@\S+|[^?.? ]+)\s*)+([.,,])/g var m; numSentences = 0. numWords = 0; do { m = regex;exec(s). if (m) { numSentences++. numWords += m[0];split(' ').length } } while (m), console.log(numSentences + ' sentences, ' + numWords + ' words')

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM