I'm trying to find all instances of a specific substring(a.b2 as an example) and return them with the 4 characters that follow after the substring match. These 4 following characters are always dynamic and can be any letter/digit/symbol.
I've tried searching, but it seems like the similar questions that are asked are requesting help with certain characters that can easily split a substring, but since the characters I'm looking for are dynamic, I'm not sure how to write the regex.
When using regex, you can use "." to dynamically match any character. Use {number}
to specify how many characters to match, and use parentheses as in (.{number})
to specify that the match should be captured for later use.
>>> import re
>>> s = "a!b2foobar a!b2bazqux a!b2spam and eggs"
>>> print(re.findall("a!b2(.{4})", s))
['foob', 'bazq', 'spam']
If you're only looking for how to grab the following 4 characters using Regex , what you are probably looking to use is the curly brace indicator for quantity to match: '{}'.
They go into more detail in the post here , but essentially you would do [aZ][0-9]{X,Y}
or (.{X,Y})
, where X to Y is the number of characters you're looking for (in your case, you would only need {4}
).
A more Pythonic way to solve this problem would be to make use of string slicing, and the index function however.
Eg. given an input_string, when you find the substring at index i using index , then you could use input_string[i+len(sub_str):i+len(sub_str)+4]
to grab those special characters.
As an example,
input_string = 'abcdefg'
sub_str = 'abcd'
found_index = input_string.index(sub_str)
start_index = found_index + len(sub_str)
symbol = input_string[start_index: start_index + 4]
Outputs (to show it works with <4 as well): efg
Index also allows you to give start and end indexes for the search, so you could also use it in a loop if you wanted to find it for every sub string, with the start of the search index being the previous found index + 1.
import re
print (re.search(r'a!b2(.{4})')).group(1))
.{4}
matches any 4 characters except special characters
. group(0)
is the complete match of the searched string. You can read about group id
here .
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.