简体   繁体   中英

Given a string how to find start and end index of all non-whitespace substrings in python

Given a string:

?           ^^^^    ^^^  --

How can I find the start and end index of all substrings after the first character?

Expected output: (12,15), (20,22), (25,26)

I tried the following but it only works for the first substring not the rest:

string = '?           ^^^^    ^^^  --'
index = len(string ) - len(string .lstrip())

Use a regular expression to match any sequence of non-whitespace. The Match object contains the start and end index of each match.

import re
string = '?           ^^^^    ^^^  --'
result = [(m.start(), m.end()-1) for m in re.finditer(r'\S+', string)][1:]

The [1:] removes the match of ? at the beginning.

I had to substract 1 from m.end() because it points to the index after the match (so that using the start and end as a range will get the whole match).

If the first substring can be immediately after ? , you need to slice the string first, and then add 1 to the start index rather than subtracting from the end index, to adjust for the missing first character.

result = [(m.start()+1, m.end()) for m in re.finditer(r'\S+', string[1:])]

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM