简体   繁体   English

正则表达式返回两个特殊字符之间的所有字符

[英]Regular expression to return all characters between two special characters

How would I go about using regx to return all characters between two brackets. 我如何使用regx返回两个括号之间的所有字符。 Here is an example: 这是一个例子:

foobar['infoNeededHere']ddd
needs to return infoNeededHere

I found a regex to do it between curly brackets but all attempts at making it work with square brackets have failed. 我在大括号之间找到了一个正则表达式,但所有尝试使用方括号的尝试都失败了。 Here is that regex: (?<={)[^}]*(?=}) and here is my attempt to hack it 这是正则表达式: (?<={)[^}]*(?=})这是我试图破解它

(?<=[)[^}]*(?=])

Final Solution: 最终解决方案

import re

str = "foobar['InfoNeeded'],"
match = re.match(r"^.*\['(.*)'\].*$",str)
print match.group(1)

If you're new to REG (gular) EX (pressions) you learn about them at Python Docs . 如果您是REG (gular) EX (新闻)的新手,您可以在Python Docs中了解它们。 Or, if you want a gentler introduction, you can check out the HOWTO . 或者,如果您想要更温和的介绍,可以查看HOWTO They use Perl-style syntax. 他们使用Perl风格的语法。

Regex 正则表达式

The expression that you need is .*?\\[(.*)\\].* . 你需要的表达式是.*?\\[(.*)\\].* The group that you want will be \\1 . 你想要的小组将是\\1
- .*? - .*? : . . matches any character but a newline. 匹配任何字符,但换行。 * is a meta-character and means Repeat this 0 or more times . *是元字符,表示重复此次0次或更多次 ? makes the * non-greedy, ie, . 使*非贪婪,即. will match up as few chars as possible before hitting a '['. 在击中'['之前,将尽可能少的字符匹配。
- \\[ : \\ escapes special meta-characters, which in this case, is [ . - \\[ \\转义特殊元字符,在本例中为[ If we didn't do that, [ would do something very weird instead. 如果我们不这样做, [会做一些非常奇怪的事情。
- (.*) : Parenthesis 'groups' whatever is inside it and you can later retrieve the groups by their numeric IDs or names (if they're given one). - (.*) 括号'分组'其中的任何内容,您可以稍后通过其数字ID或名称检索组(如果它们被赋予一个)。
- \\].* : You should know enough by now to know what this means. - \\].* 你现在应该足够了解这意味着什么。

Implementation 履行

First, import the re module -- it's not a built-in -- to where-ever you want to use the expression. 首先,将re模块 - 它不是内置的 - 导入到你想要使用表达式的地方。

Then, use re.search(regex_pattern, string_to_be_tested) to search for the pattern in the string to be tested. 然后,使用re.search(regex_pattern, string_to_be_tested)搜索要测试的字符串中的模式。 This will return a MatchObject which you can store to a temporary variable. 这将返回一个MatchObject ,您可以将其存储到临时变量中。 You should then call it's group() method and pass 1 as an argument (to see the 'Group 1' we captured using parenthesis earlier). 然后,您应该调用它的group()方法并将1作为参数传递(以查看我们之前使用括号捕获的“组1”)。 I should now look like: 我现在应该看起来像:

>>> import re
>>> pat = r'.*?\[(.*)].*'             #See Note at the bottom of the answer
>>> s = "foobar['infoNeededHere']ddd"
>>> match = re.search(pat, s)
>>> match.group(1)
"'infoNeededHere'"

An Alternative 替代

You can also use findall() to find all the non-overlapping matches by modifying the regex to (?>=\\[).+?(?=\\]) . 您还可以使用findall()通过将正则表达式修改为(?>=\\[).+?(?=\\])来查找所有非重叠匹配。
- (?<=\\[) : (?<=) is called a look-behind assertion and checks for an expression preceding the actual match. - (?<=\\[) : (?<=)被称为后视断言并检查实际匹配之前的表达式。
- .+? - .+? : + is just like * except that it matches one or more repititions. +就像*只是它匹配一个或多个repititions。 It is made non-greedy by ? 它是非贪婪的? .
- (?=\\]) : (?=) is a look- ahead assertion and checks for an expression following the match w/o capturing it. - (?=\\]) (?=)前瞻判断和检查表达式跟随比赛的w / o捕获它。
Your code should now look like: 您的代码现在应该如下所示:

>>> import re
>>> pat = r'(?<=\[).+?(?=\])'  #See Note at the bottom of the answer
>>> s = "foobar['infoNeededHere']ddd[andHere] [andOverHereToo[]"
>>> re.findall(pat, s)
["'infoNeededHere'", 'andHere', 'andOverHereToo['] 

Note: Always use raw Python strings by adding an 'r' before the string (Eg: r'blah blah blah' ). 注意:始终使用原始Python字符串,在字符串前添加“r”(例如: r'blah blah blah' )。

10x for reading! 10倍阅读! I wrote this answer when there were no accepted ones yet, but by the time I finished it, 2 ore came up and one got accepted. 当没有被接受的时候我写了这个答案,但是当我完成它的时候,有2个矿石出现了,一个被接受了。 :( x< :( x <

^.*\\['(.*)'\\].*$ will match a line and capture what you want in a group. ^.*\\['(.*)'\\].*$将匹配一行并捕获组中的内容。

You have to escape the [ and ] with \\ 你必须逃避[]\\

The documentation at the rubular.com proof link will explain how the expression is formed. rubular.com 证明链接中的文档将解释表达式是如何形成的。

If there's only one of these [.....] tokens per line, then you don't need to use regular expressions at all: 如果每行只有一个[.....]标记,那么你根本不需要使用正则表达式:

In [7]: mystring = "Bacon, [eggs], and spam"

In [8]: mystring[ mystring.find("[")+1 : mystring.find("]") ]
Out[8]: 'eggs'

If there's more than one of these per line, then you'll need to modify Jarrod's regex ^.*\\['(.*)'\\].*$ to match multiple times per line, and to be non greedy. 如果每行不止一个,那么你需要修改Jarrod的正则表达式^.*\\['(.*)'\\].*$以匹配每行多次,并且非贪婪。 (Use the .*? quantifier instead of the .* quantifier.) (使用.*?量词而不是.*量词。)

In [15]: mystring = "[Bacon], [eggs], and [spam]."

In [16]: re.findall(r"\[(.*?)\]",mystring)
Out[16]: ['Bacon', 'eggs', 'spam']

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM