简体   繁体   中英

Exact search of a string that has parenthesis using regex

I am new to regexes.

I have the following string: \n(941)\n364\nShackle\n(941)\nRivet\n105\nTop

Out of this string, I want to extract Rivet and I already have (941) as a string in a variable.

My thought process was like this:

  1. Find all the (941) s
  2. filter the results by checking if the string after (941) is followed by \n, followed by a word, and ending with \n
  3. I made a regex for the 2nd part: \n[\w\s\'\d\-\/\.]+$\n .

The problem I am facing is that because of the parenthesis in (941) the regex is taking 941 as a group. In the 3rd step the regex may be wrong, which I can fix later, but 1st I needed help in finding the 2nd (941) so then I can apply the 3rd step on that.

PS.

  1. I know I can use python string methods like find and then loop over the searches, but I wanted to see if this can be done directly using regex only.
  2. I have tried the following regex: (?:...) , (941){1} and the make regex literal character \ like this \(941\) with no useful results. Maybe I am using them wrong.

Just wanted to know if it is possible to be done using regex. Though it might be useful for others too or a good share for future viewers.

Thanks

Assuming:

  • You want to avoid matching only digits;
  • Want to match a substring made of word-characters (thus including possible digits);

Try to escape the variable and use it in the regular expression through f-string:

import re
s = '\n(941)\n364\nShackle\n(941)\nRivet\n105\nTop'
var1 = '(941)'
var2 = re.escape(var1)
m = re.findall(fr'{var2}\n(?!\d+\n)(\w+)', s)[0]
print(m)

Prints:

Rivet

If you have text in a variable that should be matched exactly, use re.escape() to escape it when substituting into the regexp.

s = '\n(941)\n364\nShackle\n(941)\nRivet\n105\nTop'
num = '(941)'
re.findall(rf'(?<=\n{re.escape(num)}\n)[\w\s\'\d\-\/\.]+(?=\n)', s)

This puts (941)\n in a lookbehind, so it's not included in the match. This avoids a problem with the \n at the end of one match overlapping with the \n at the beginning of the next.

The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM