简体   繁体   English

如何使用 REGEX 在长字符串中查找 IP 地址?

[英]How can I find an IP address in a long string with REGEX?

I want to find out how can I extract only the correct IP address from a very long string.我想知道如何从一个很长的字符串中只提取正确的IP 地址。 The problem is that my code extract the IP address even if a part of it has more than 3 digits (which is incorrect).问题是我的代码会提取 IP 地址,即使它的一部分超过 3 位数(这是不正确的)。

I tried to learn more about Python REGEX, but I don't know exactly how to stop it at maximum 3 consecutive digits after a dot.我试图了解有关 Python REGEX 的更多信息,但我不知道如何在点后最多 3 个连续数字处停止它。 What I mean is that if an IP is 1.2.3.4 it finds it correctly, which is indeed correct, but if an IP is 1.2.3.4567 it also finds it correctly, which is not correct.我的意思是,如果 IP 是1.2.3.4 ,它会正确找到它,这确实是正确的,但是如果 IP 是1.2.3.4567它也会正确找到它,这是不正确的。 I don't know how to say to it that if a group has more than 3 digits, than that's not an IP address.我不知道该怎么说,如果一个组的数字超过 3 位,那么这不是一个 IP 地址。

import re

secv = "akmfiawnmgisa gisamgisamgsagr[sao l321r1m r2p4 2342po4k2m4 22.33.4.aer 1.2.3.5344 99.99.99.100 asoifinagf sadgsangidsng sg"

b = re.findall(r"[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.\d{1,3}", secv)

print(b)

It prints 1.2.3.5344 and also 99.99.99.100 , but 1.2.3.5344 is not an IP address because it has more than 3 consecutive digits.它打印1.2.3.534499.99.99.100 ,但1.2.3.5344不是 IP 地址,因为它有超过 3 个连续数字。

import re

secv = "90.123.1.100 akmfiawnmgisa gisamgisamgsagr[sao l321r1m r2p4 2342po4k2m4 22.33.4.aer 1.2.3.5344 99.99.99.100 asoifinagf sadgsangidsng sg 13.18.19.100 1.2.3.4"

b = re.findall(r"(?:\s|\A)(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})(?=\s|\Z)",secv)

b = list(filter(lambda x: all([int(y) <= 255 for y in x.split('.')]), b))


print(b)

To make it more interesting I added IP addresses at the beginning and end of your string.为了让它更有趣,我在字符串的开头和结尾添加了 IP 地址。 I am assuming that the ip address needs to be separated by white space on both sides if not at the beginning or end of the string.我假设如果不是在字符串的开头或结尾,ip 地址需要在两边用空格分隔。 So I added to the REGEX at the beginning a non-capturing group (?:\\s|\\A) that will match either a white space character or the beginning of the string.所以我在 REGEX 的开头添加了一个非捕获组(?:\\s|\\A) ,它将匹配空白字符字符串的开头。 I have also added to the end of the REGEX a lookahead assertion (?=\\s|\\Z) that will match a single white space character or the end of the line without consuming any characters .我还在 REGEX 的末尾添加了一个先行断言(?=\\s|\\Z),它将匹配单个空白字符或行尾而不消耗任何字符 The above prints out:以上打印出来:

['90.123.1.100', '99.99.99.100', '13.18.19.100', '1.2.3.4']

Just search for the pattern [1-2]?[0-9]{1,2} occurring 4 times separated by a dot.只需搜索模式[1-2]?[0-9]{1,2}出现 4 次,以点分隔。 Optionally anchor your regex pattern with a word boundary at the beginning and end (可选)在开头和结尾使用单词边界锚定您的正则表达式模式

>>> re.findall(r'\b(?:[1-2]?[0-9]{1,2}\.){3}[1-2]?[0-9]{1,2}\b', secv)
['99.99.99.100']                                                      

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM