简体   繁体   English

使用正则表达式精确搜索带括号的字符串

[英]Exact search of a string that has parenthesis using regex

I am new to regexes.我是正则表达式的新手。

I have the following string: \n(941)\n364\nShackle\n(941)\nRivet\n105\nTop我有以下字符串: \n(941)\n364\nShackle\n(941)\nRivet\n105\nTop

Out of this string, I want to extract Rivet and I already have (941) as a string in a variable.从这个字符串中,我想提取Rivet并且我已经将(941)作为变量中的字符串。

My thought process was like this:我的思考过程是这样的:

  1. Find all the (941) s找到所有(941) s
  2. filter the results by checking if the string after (941) is followed by \n, followed by a word, and ending with \n通过检查(941)之后的字符串是否后跟 \n、后跟一个单词并以 \n 结尾来过滤结果
  3. I made a regex for the 2nd part: \n[\w\s\'\d\-\/\.]+$\n .我为第二部分做了一个正则表达式: \n[\w\s\'\d\-\/\.]+$\n

The problem I am facing is that because of the parenthesis in (941) the regex is taking 941 as a group.我面临的问题是,由于(941)中的括号,正则表达式将 941 作为一个组。 In the 3rd step the regex may be wrong, which I can fix later, but 1st I needed help in finding the 2nd (941) so then I can apply the 3rd step on that.在第三步中,正则表达式可能是错误的,我可以稍后修复,但第一步我需要帮助来找到第二步(941) ,所以我可以应用第三步。

PS. PS。

  1. I know I can use python string methods like find and then loop over the searches, but I wanted to see if this can be done directly using regex only.我知道我可以使用 python 字符串方法,如 find 然后循环搜索,但我想看看这是否可以直接使用正则表达式来完成。
  2. I have tried the following regex: (?:...) , (941){1} and the make regex literal character \ like this \(941\) with no useful results.我尝试了以下正则表达式: (?:...)(941){1}和像这样\(941\)的 make 正则表达式文字字符\没有有用的结果。 Maybe I am using them wrong.也许我用错了。

Just wanted to know if it is possible to be done using regex.只是想知道是否可以使用正则表达式来完成。 Though it might be useful for others too or a good share for future viewers.虽然它可能对其他人也有用,或者对未来的观众来说是一个很好的分享。

Thanks谢谢

Assuming:假设:

  • You want to avoid matching only digits;你想避免只匹配数字;
  • Want to match a substring made of word-characters (thus including possible digits);想要匹配由单词字符组成的 substring(因此包括可能的数字);

Try to escape the variable and use it in the regular expression through f-string:尝试转义变量并通过 f-string 在正则表达式中使用它:

import re
s = '\n(941)\n364\nShackle\n(941)\nRivet\n105\nTop'
var1 = '(941)'
var2 = re.escape(var1)
m = re.findall(fr'{var2}\n(?!\d+\n)(\w+)', s)[0]
print(m)

Prints:印刷:

Rivet

If you have text in a variable that should be matched exactly, use re.escape() to escape it when substituting into the regexp.如果变量中的文本应该完全匹配,请在替换到正则表达式时使用re.escape()将其转义。

s = '\n(941)\n364\nShackle\n(941)\nRivet\n105\nTop'
num = '(941)'
re.findall(rf'(?<=\n{re.escape(num)}\n)[\w\s\'\d\-\/\.]+(?=\n)', s)

This puts (941)\n in a lookbehind, so it's not included in the match.这会将(941)\n放在后面,因此它不包含在匹配中。 This avoids a problem with the \n at the end of one match overlapping with the \n at the beginning of the next.这避免了一场比赛结束时的\n与下一场比赛开始时的\n重叠的问题。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM