简体   繁体   中英

How do you find all instances of a substring, followed by a certain number of dynamic characters?

I'm trying to find all instances of a specific substring(a.b2 as an example) and return them with the 4 characters that follow after the substring match. These 4 following characters are always dynamic and can be any letter/digit/symbol.

I've tried searching, but it seems like the similar questions that are asked are requesting help with certain characters that can easily split a substring, but since the characters I'm looking for are dynamic, I'm not sure how to write the regex.

When using regex, you can use "." to dynamically match any character. Use {number} to specify how many characters to match, and use parentheses as in (.{number}) to specify that the match should be captured for later use.

>>> import re
>>> s = "a!b2foobar a!b2bazqux a!b2spam and eggs"
>>> print(re.findall("a!b2(.{4})", s))
['foob', 'bazq', 'spam']

If you're only looking for how to grab the following 4 characters using Regex , what you are probably looking to use is the curly brace indicator for quantity to match: '{}'.

They go into more detail in the post here , but essentially you would do [aZ][0-9]{X,Y} or (.{X,Y}) , where X to Y is the number of characters you're looking for (in your case, you would only need {4} ).


A more Pythonic way to solve this problem would be to make use of string slicing, and the index function however.

Eg. given an input_string, when you find the substring at index i using index , then you could use input_string[i+len(sub_str):i+len(sub_str)+4] to grab those special characters.

As an example,

input_string = 'abcdefg'
sub_str = 'abcd'
found_index = input_string.index(sub_str)
start_index = found_index + len(sub_str)
symbol = input_string[start_index: start_index + 4]

Outputs (to show it works with <4 as well): efg

Index also allows you to give start and end indexes for the search, so you could also use it in a loop if you wanted to find it for every sub string, with the start of the search index being the previous found index + 1.

import re
print (re.search(r'a!b2(.{4})')).group(1))

.{4} matches any 4 characters except special characters . group(0) is the complete match of the searched string. You can read about group id here .

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM