A regex pattern that matches all words starting from a word with an s and stopping before a word that starts with an s

Question

I'm trying to capture words in a string such that the first word starts with an s, and the regex stops matching if the next word also starts with an s.

For example. I have the string " Stack, Code and StackOverflow". I want to capture only " Stack, Code and " and not include "StackOverflow" in the match.

This is what I am thinking:

Start with a space followed by an s.
Match everything except if the group is a space and an s (I'm using negative lookahead).

The regex I have tried:

(?<=\s)S[a-z -,]*(?!(\sS))

I don't know how to make it work.

Answer 1

I think this should work. I adapted the regex from this thread . You can also test it out here . I have also included a non-regex solution. I basically track the first occurrence of a word starting with an 's' and the next word starting with an 's' and get the words in that range.

import re

teststring = " Stack, Code and StackOverflow"
extractText = re.search(r"(\s)[sS][^*\s]*[^sS]*", teststring)

print(extractText[0])

#non-regex solution
listwords = teststring.split(' ')

# non regex solution
start = 0
end = 0
for i,word in enumerate(listwords):
    if word.startswith('s') or word.startswith('S'):
        if start == 0:
            start = i
        else:
            end = i
            break

newstring = " " + " ".join([word for word in listwords[start:end]])
print(newstring)

Output

 Stack, Code and
 Stack, Code and

Answer 2

You could use for example a capture group:

(S(?<!\S.).*?)\s*S(?<!\S.)

Explanation

( Capture group 1
- S(?<.\S.) Match S and assert that to the left of the S there is not a whitespace boundary
- .*? Match any character, as few as possible
) Close group
\s* Match optional whitespace chars
S(?<.\S.) Match S and assert that to the left of the S there is not a whitespace boundary

See a regex demo and a Python demo .

Example code:

import re

pattern = r"(S(?<!\S.).*?)\s*S(?<!\S.)"
s = "Stack, Code and StackOverflow"
m = re.search(pattern, s)
if m:
    print(m.group(1))

Output

Stack, Code and

Another option using a lookaround to assert the S to the right and not consume it to allow multiple matches after each other:

 S(?<!\S.).*?(?=\s*S(?<!\S.))

Regex demo

import re

pattern = r"S(?<!\S.).*?(?=\s*S(?<!\S.))"
s = "Stack, Code and StackOverflow test Stack"
print(re.findall(pattern, s))

Output

['Stack, Code and', 'StackOverflow test']

A regex pattern that matches all words starting from a word with an s and stopping before a word that starts with an s

Question

2 answers

solution1
0 ACCPTED 2022-11-28 05:34:25

Output

solution2
0 2022-11-28 09:14:44

A regex pattern that matches all words starting from a word with an s and stopping before a word that starts with an s

Question

2 answers

solution1 0 ACCPTED 2022-11-28 05:34:25

Output

solution2 0 2022-11-28 09:14:44

solution1
0 ACCPTED 2022-11-28 05:34:25

solution2
0 2022-11-28 09:14:44