我会使用什么样的正则表达来匹配它？

Question

I have several strings which look like the following: 我有几个字符串，如下所示：

<some_text> TAG[<some_text>@11.22.33.44] <some_text>

I want to get the ip_address and only the ip_address from this line. 我想从这一行得到ip_address和只有ip_address。 (For the sake of this example, assume that the ip address will always be in this format xx.xx.xx.xx) （为了这个例子，假设ip地址将始终采用这种格式xx.xx.xx.xx）

Edit: I'm afraid I wasn't clear. 编辑：恐怕我不清楚。

The strings will look something like this: 字符串看起来像这样：

<some_text> TAG1[<some_text>@xx.xx.xx.xx] <some_text> TAG2[<some_text>@yy.yy.yy.yy] <some_text>

Note that the 'some_text' can be a variable length. 请注意，'some_text'可以是可变长度。 I need to associate different regex's to different tags so that when r.group() is called, the ip address will be returned. 我需要将不同的正则表达式关联到不同的标记，以便在调用r.group（）时，将返回ip地址。 In the above case the regex's would not be different but it is a bad example. 在上面的例子中，正则表达式不会有所不同，但它是一个不好的例子。

The regexes I have tried so far have been inadequate. 到目前为止我尝试过的正则表达式都不够用。

Ideally, I would like something like this: 理想情况下，我想要这样的事情：

r = re.search('(?<=TAG.*@)(\d\d.\d\d.\d\d.\d\d)', line)

where line is in the format specified above. 其中line的格式为上面指定的格式。 However, this does not work because you need to have a fixed width look-behind assertion. 但是，这不起作用，因为您需要具有固定宽度的后视断言。

Additionally, I have tried non-capturing groups as such: 另外，我尝试过非捕获组：

r = re.search('(?<=TAG\[)(?:.*@)(\d\d.\d\d.\d\d.\d\d)', line)

However, I cannot use this because r.group() will return some_text@xx.xx.xx.xx 但是，我不能使用它，因为r.group（）将返回some_text@xx.xx.xx.xx

I understand that r.group(1) will return just the ip address. 我知道r.group（1）只返回ip地址。 Unfortunately, the script I am writing requires that all my regex will return the correct result after calling r.group(). 不幸的是，我写的脚本要求我的所有正则表达式在调用r.group（）后返回正确的结果。

What kind of regex could I use for this situation? 我可以在这种情况下使用什么样的正则表达式？

The code is in python. 代码是在python中。

Note: All of the some_text can be variable length 注意：所有some_text都可以是可变长度

Answer 1

Try re.search('(?<=@)\\d\\d\\.\\d\\d\\.\\d\\d\\.\\d\\d(?=\\])', line) . 尝试re.search('(?<=@)\\d\\d\\.\\d\\d\\.\\d\\d\\.\\d\\d(?=\\])', line) 。

In fact, re.search('\\d\\d\\.\\d\\d\\.\\d\\d\\.\\d\\d', line) may get you what you need if the only occurrence of the xx.xx.xx.xx format in the strings being checked is in those IP address sections. 实际上，如果只出现xx.xx，则re.search('\\d\\d\\.\\d\\d\\.\\d\\d\\.\\d\\d', line)可以获得所需的内容。正在检查的字符串中的xx.xx格式位于这些IP地址部分中。

EDIT: As stated in my comment, to find all occurrences of the wanted pattern in a string, you just do re.findall(pattern_to_match, line) . 编辑：正如我的评论中所述，要查找字符串中所有出现的所需模式，您只需执行re.findall(pattern_to_match, line) 。 So in this case, re.findall('\\d\\d\\.\\d\\d\\.\\d\\d\\.\\d\\d', line) (or more generally, re.findall('\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}', line) ). 所以在这种情况下， re.findall('\\d\\d\\.\\d\\d\\.\\d\\d\\.\\d\\d', line) （或更一般地说， re.findall('\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}', line) ）。

EDIT 2: From your comment, this should work (with tagname being the tag of the IP address you currently want). 编辑2：从您的评论中，这应该工作（ tagname是您当前想要的IP地址的标记）。

r = re.search(tagname + '\[.+?@(?P<ip>\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})', line)

And then you'd just refer to it with r.group("ip") like psmears said. 然后你只需用r.group("ip")来引用它就像psmears所说的那样。

...In fact, there's an easy way to make the regex a bit more concise. ......实际上，有一种简单的方法可以使正则表达式更简洁。

r = re.search(tagname + r'\[.+?@(?P<ip>(?:\d{1,3}\.?){4})', line)

In fact, you could even do this: 事实上，你甚至可以这样做：

r = re.findall('(?P<tag>\S+)\[.+?@(?P<ip>(?:\d{1,3}\.?){4})', line)

Which would return you a list containing the tags and their associated IP addresses, and so you wouldn't have to recheck any one string once you found the matches if you wanted to refer to the IP address of a different tag from the same string. 哪个会返回一个包含标签及其相关IP地址的列表，因此如果您想要从同一个字符串中引用不同标签的IP地址，则不必在找到匹配项后重新检查任何一个字符串。

...In fact, going two steps further (farther?), you could do the following: ......事实上，进一步走得更远（更远？），你可以做到以下几点：

r = dict((m.group("tag"), m.group("ip")) for m in re.finditer('(?P<tag>\S+)\[.+?@(?P<ip>(?:\d{1,3}\.?){4})', line))

Or in Python 3: 或者在Python 3中：

r = {(m.group("tag"), m.group("ip")) for m in re.finditer('(?P<tag>\S+)\[.+?@(?P<ip>(?:\d{1,3}\.?){4})', line)}

And then r would be a dict with the tags as keys and the IP addresses as the respective values. 然后r将是一个dict，标签作为键，IP地址作为相应的值。

Answer 2

Why do you want to use groups or look behinds at all? 你为什么要使用群组或者根本不看？ What is wrong with re.search('TAG\\[.*@(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\]') ? re.search('TAG\\[.*@(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\]')什么问题re.search('TAG\\[.*@(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\]') ？

Answer 3

I don't think it's possible to do that - r.group() will always return the whole string that matched, so you're forced to use lookbehind, which as you say must be fixed width. 我认为不可能这样做 - r.group（）将始终返回匹配的整个字符串，因此您不得不使用lookbehind，正如您所说，必须是固定宽度。

Instead, I'd suggest modifying the script that you're writing. 相反，我建议修改你正在编写的脚本。 I'm guessing that you have a whole load of regexps that it matches, and you don't want to have to specify for each one "this one uses r.group(0)", "this one uses r.group(3)" etc. 我猜你有一整套匹配的正则表达式，并且你不想为每一个指定“这个使用r.group（0）”，“这个使用r.group（3））“等

In that case, you could use Python's named groups facility: you can name a group in a regular expression like this: 在这种情况下，您可以使用Python的命名组工具：您可以在正则表达式中命名一个组，如下所示：

(?P<name>CONTENTS)

then retrieve what matched with r.group("name") . 然后检索与r.group("name")匹配的r.group("name") 。

What I suggest doing in your script is: match the regular expression, then test if r.group("usethis") is set. 我建议你在脚本中做的是：匹配正则表达式，然后测试是否设置了r.group("usethis") 。 If so - use that; 如果是这样 - 使用那个; if not - then use r.group() as before. 如果不是 - 那么像以前一样使用r.group（）。

That way you can cope with awkward situations like this by specifying the group name usethis in the regexp - but your other regexps don't have to know or care. 这样，你可以通过指定组名称，像这样难堪的局面应付usethis在正规表达式-但你的其他正则表达式不必知道或关心。

Answer 4

Almost but I think that you need to change the .* at the start to . 几乎，但我认为你需要在开始时改变。*。 *? *？ since you may have multiple TAGs on a single line (I believe - as there is in the example) 因为你可能在一行上有多个TAG（我相信 - 正如示例中所示）

re.search('TAG(\d+)\[.*?@(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})]')

The Tag ID will be in the first backreference and the IP address will be in the second back reference 标签ID将位于第一个反向引用中，IP地址将位于第二个反向引用中

我会使用什么样的正则表达来匹配它？

问题描述

4 个解决方案

解决方案1
2 2010-06-30 18:30:55

解决方案2
1 2010-06-30 18:22:23

解决方案3
1 已采纳 2010-06-30 18:27:59

解决方案4
0 2010-07-01 17:24:55

我会使用什么样的正则表达来匹配它？

问题描述

4 个解决方案

解决方案1 2 2010-06-30 18:30:55

解决方案2 1 2010-06-30 18:22:23

解决方案3 1 已采纳 2010-06-30 18:27:59

解决方案4 0 2010-07-01 17:24:55

解决方案1
2 2010-06-30 18:30:55

解决方案2
1 2010-06-30 18:22:23

解决方案3
1 已采纳 2010-06-30 18:27:59

解决方案4
0 2010-07-01 17:24:55