[英]Regular expression to return all characters between two special characters
How would I go about using regx to return all characters between two brackets. 我如何使用regx返回两个括号之间的所有字符。 Here is an example:
这是一个例子:
foobar['infoNeededHere']ddd
needs to return infoNeededHere
I found a regex to do it between curly brackets but all attempts at making it work with square brackets have failed. 我在大括号之间找到了一个正则表达式,但所有尝试使用方括号的尝试都失败了。 Here is that regex:
(?<={)[^}]*(?=})
and here is my attempt to hack it 这是正则表达式:
(?<={)[^}]*(?=})
这是我试图破解它
(?<=[)[^}]*(?=])
Final Solution: 最终解决方案
import re
str = "foobar['InfoNeeded'],"
match = re.match(r"^.*\['(.*)'\].*$",str)
print match.group(1)
If you're new to REG (gular) EX (pressions) you learn about them at Python Docs . 如果您是REG (gular) EX (新闻)的新手,您可以在Python Docs中了解它们。 Or, if you want a gentler introduction, you can check out the HOWTO .
或者,如果您想要更温和的介绍,可以查看HOWTO 。 They use Perl-style syntax.
他们使用Perl风格的语法。
The expression that you need is .*?\\[(.*)\\].*
. 你需要的表达式是
.*?\\[(.*)\\].*
。 The group that you want will be \\1
. 你想要的小组将是
\\1
。
- .*?
-
.*?
: .
:
.
matches any character but a newline. 匹配任何字符,但换行。
*
is a meta-character and means Repeat this 0 or more times . *
是元字符,表示重复此次0次或更多次 。 ?
makes the *
non-greedy, ie, .
使
*
非贪婪,即.
will match up as few chars as possible before hitting a '['. 在击中'['之前,将尽可能少的字符匹配。
- \\[
: \\
escapes special meta-characters, which in this case, is [
. -
\\[
: \\
转义特殊元字符,在本例中为[
。 If we didn't do that, [
would do something very weird instead. 如果我们不这样做,
[
会做一些非常奇怪的事情。
- (.*)
: Parenthesis 'groups' whatever is inside it and you can later retrieve the groups by their numeric IDs or names (if they're given one). -
(.*)
:括号'分组'其中的任何内容,您可以稍后通过其数字ID或名称检索组(如果它们被赋予一个)。
- \\].*
: You should know enough by now to know what this means. -
\\].*
:你现在应该足够了解这意味着什么。
First, import the re
module -- it's not a built-in -- to where-ever you want to use the expression. 首先,将
re
模块 - 它不是内置的 - 导入到你想要使用表达式的地方。
Then, use re.search(regex_pattern, string_to_be_tested)
to search for the pattern in the string to be tested. 然后,使用
re.search(regex_pattern, string_to_be_tested)
搜索要测试的字符串中的模式。 This will return a MatchObject
which you can store to a temporary variable. 这将返回一个
MatchObject
,您可以将其存储到临时变量中。 You should then call it's group()
method and pass 1 as an argument (to see the 'Group 1' we captured using parenthesis earlier). 然后,您应该调用它的
group()
方法并将1作为参数传递(以查看我们之前使用括号捕获的“组1”)。 I should now look like: 我现在应该看起来像:
>>> import re
>>> pat = r'.*?\[(.*)].*' #See Note at the bottom of the answer
>>> s = "foobar['infoNeededHere']ddd"
>>> match = re.search(pat, s)
>>> match.group(1)
"'infoNeededHere'"
You can also use findall()
to find all the non-overlapping matches by modifying the regex to (?>=\\[).+?(?=\\])
. 您还可以使用
findall()
通过将正则表达式修改为(?>=\\[).+?(?=\\])
来查找所有非重叠匹配。
- (?<=\\[)
: (?<=)
is called a look-behind assertion and checks for an expression preceding the actual match. -
(?<=\\[)
: (?<=)
被称为后视断言并检查实际匹配之前的表达式。
- .+?
-
.+?
: +
is just like *
except that it matches one or more repititions. :
+
就像*
只是它匹配一个或多个repititions。 It is made non-greedy by ?
它是非贪婪的
?
. 。
- (?=\\])
: (?=)
is a look- ahead assertion and checks for an expression following the match w/o capturing it. -
(?=\\])
(?=)
是前瞻判断和检查表达式跟随比赛的w / o捕获它。
Your code should now look like: 您的代码现在应该如下所示:
>>> import re
>>> pat = r'(?<=\[).+?(?=\])' #See Note at the bottom of the answer
>>> s = "foobar['infoNeededHere']ddd[andHere] [andOverHereToo[]"
>>> re.findall(pat, s)
["'infoNeededHere'", 'andHere', 'andOverHereToo[']
Note: Always use raw Python strings by adding an 'r' before the string (Eg: r'blah blah blah'
). 注意:始终使用原始Python字符串,在字符串前添加“r”(例如:
r'blah blah blah'
)。
10x for reading! 10倍阅读! I wrote this answer when there were no accepted ones yet, but by the time I finished it, 2 ore came up and one got accepted.
当没有被接受的时候我写了这个答案,但是当我完成它的时候,有2个矿石出现了,一个被接受了。 :( x<
:( x <
^.*\\['(.*)'\\].*$
will match a line and capture what you want in a group. ^.*\\['(.*)'\\].*$
将匹配一行并捕获组中的内容。
You have to escape the [
and ]
with \\
你必须逃避
[
和]
与\\
The documentation at the rubular.com proof link will explain how the expression is formed. rubular.com 证明链接中的文档将解释表达式是如何形成的。
If there's only one of these [.....]
tokens per line, then you don't need to use regular expressions at all: 如果每行只有一个
[.....]
标记,那么你根本不需要使用正则表达式:
In [7]: mystring = "Bacon, [eggs], and spam"
In [8]: mystring[ mystring.find("[")+1 : mystring.find("]") ]
Out[8]: 'eggs'
If there's more than one of these per line, then you'll need to modify Jarrod's regex ^.*\\['(.*)'\\].*$
to match multiple times per line, and to be non greedy. 如果每行不止一个,那么你需要修改Jarrod的正则表达式
^.*\\['(.*)'\\].*$
以匹配每行多次,并且非贪婪。 (Use the .*?
quantifier instead of the .*
quantifier.) (使用
.*?
量词而不是.*
量词。)
In [15]: mystring = "[Bacon], [eggs], and [spam]."
In [16]: re.findall(r"\[(.*?)\]",mystring)
Out[16]: ['Bacon', 'eggs', 'spam']
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.