简体   繁体   中英

Regex two given words in one sentence

I want to get a regex which can tell if two given words are in one sentence (word order matters). The problem is that I can have a contraction in a sentence, so the period doesn't indicate that there's the end of the sentence. The part of regex which indicates the end of the sentence is

\\.(\s+[A-Z]|\s*$)

What would the pattern look like?

You could use this:

(\b\w+\b)(?:[^.]|\.\s)*(\b\w+\b)

This basically says, match and capture a word, then anything that is not a period, or a period followed ba space, any number of times, and finally match and capture another word.

EDIT: For given words in either order, use:

(\bWord1\b)(?:[^.]|\.\s)*(\bWord2\b)|(\bWord2\b)(?:[^.]|\.\s)*(\bWord1\b)

Not c#, but you should get the idea

for sentence in split_text_with_regex(text):
    index_word1 = sentence.find(word1)
    index_word2 = sentence.find(word2)
    # do your thing

这里有一组非常好的选项http://www.regular-expressions.info/near.html

Also you can construct the regular expression in Visual Studio itself . Refer to this link http://msdn.microsoft.com/en-us/library/2k3te2cs(VS.80).aspx first paragraph

So I think it's something like this (untested):

(([\w\s]*\s)?Word1\s([\w\s]*)?\sWord2(\s[\w\s]*)?\.)(?=(\s+[A-Z]|\s*$))

Edit: Thinking about it, that won't match punctuation (commas, apostrophes). Perhaps each [\\w\\s] should be [^\\.] or a list of possible characters.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM