IP 正则表达式，所有站点都没有子网

Question

As a part of a project of my company, I need to extract IP addresses that don't include subnetting (eg 196.82.1.12/24) from some websites.作为我公司项目的一部分，我需要从某些网站中提取不包括子网划分的 IP 地址（例如 196.82.1.12/24）。

If the address contains subnetting, I don't want to grab the part proceeding the subnetting but not taking it at all.如果地址包含子网划分，我不想抓住继续进行子网划分但根本不采取它的部分。

for example on the following case:例如在以下情况下：

<td>212.179.35.154</td>
<td>200.139.97.126/24</td>
<td>"201.139.97.126"</td>
<td>F5 BIG-IP</td>
<td>unknown</td>
<td class="date">26-Feb-2011</td>

The desired output would be:所需的 output 将是：

212.179.35.154 212.179.35.154

201.139.97.126 201.139.97.126

Please note that some lines include quotes surrounding the IP address however since there is no following /NUMBER they are valid.请注意，有些行包含围绕 IP 地址的引号，但是由于没有以下 /NUMBER，因此它们是有效的。

I'm trying to find an appropriate regex for days now such as:我现在正在尝试找到合适的正则表达式，例如：

(<td>(\d+\.){3}\d+<\/td>)
\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}[^\/]

However, all seem to have a flaw within them.然而，所有这些似乎都存在缺陷。

Thanks in advance!提前致谢！

Answer 1

For me it looks like task where negative lookahead will be useful.对我来说，它看起来像是负前瞻很有用的任务。 I would do:我会做：

import re
txt = '''<td>212.179.35.154</td>
<td>200.139.97.126/24</td>
<td>"201.139.97.126"</td>
<td>F5 BIG-IP</td>
<td>unknown</td>
<td class="date">26-Feb-2011</td>'''
pattern = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?![0-9/])"
found = re.findall(pattern, txt)
print(found)

Output: Output：

['212.179.35.154', '201.139.97.126']

By using negative lookahead (?![0-9/]) we say: exclude matches if they are followed by 0 or 1 or 2 or 3 or 4 or 5 or 6 or 7 or 8 or 9 or / .通过使用负前瞻(?![0-9/])我们说：如果匹配项后面跟着0或1或2或3或4或5或6或7或8或9或/ ，则排除匹配项。 Note that including digits is crucial here, because if you specify only / one of matches would be:请注意，此处包含数字至关重要，因为如果您仅指定 / 匹配项之一将是：

200.139.97.12

(note missing 6 at end) （注意最后少了6 ）

Answer 2

You can use a negative lookahead assertion , by using the pattern syntax (?....) , like this:您可以通过使用模式语法(?....)来使用否定的前瞻断言，如下所示：

import re

s = """
<td>212.179.35.154</td>
<td>200.139.97.126/24</td>
<td>"201.139.97.126"</td>
<td>F5 BIG-IP</td>
<td>unknown</td>
<td class="date">26-Feb-2011</td>
"""

pattern = r"\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}(?!\d*\/)"

print(re.findall(pattern,s))

Output: Output：

['212.179.35.154', '201.139.97.126']

The (?!\d*\/) part tells it "don't match the previous pattern if it is followed by any digits and a forward slash". (?!\d*\/)部分告诉它“如果后面跟着任何数字和正斜杠，则不匹配前一个模式”。
(the \d* part is because otherwise it will match 200.139.97.12 (without the 6 ) out of 200.139.97.126/24 ) （ \d*部分是因为否则它将匹配200.139.97.12 200.139.97.126/24没有6 ）

small note: your original pattern will match more than just legal IP addresses, but I went with your way.小提示：您的原始模式将不仅仅匹配合法的 IP 地址，但我选择了您的方式。

IP 正则表达式，所有站点都没有子网

问题描述

2 个解决方案

解决方案1
2 2020-07-28 09:29:49

解决方案2
1 已采纳 2020-07-28 09:29:05

IP 正则表达式，所有站点都没有子网

问题描述

2 个解决方案

解决方案1 2 2020-07-28 09:29:49

解决方案2 1 已采纳 2020-07-28 09:29:05

解决方案1
2 2020-07-28 09:29:49

解决方案2
1 已采纳 2020-07-28 09:29:05