[英]How to extract an IP address from an HTML string?
I want to extract an IP address from a string (actually a one-line HTML) using Python. 我想使用Python从字符串(实际上是单行HTML)中提取IP地址。
>>> s = "<html><head><title>Current IP Check</title></head><body>Current IP Address: 165.91.15.131</body></html>"
-- '165.91.15.131' is what I want! - '165.91.15.131'是我想要的!
I tried using regular expressions, but so far I can only get to the first number. 我尝试使用正则表达式,但到目前为止我只能使用第一个数字。
>>> import re
>>> ip = re.findall( r'([0-9]+)(?:\.[0-9]+){3}', s )
>>> ip
['165']
But I don't have a firm grasp on reg-expression; 但我对reg-expression缺乏把握; the above code was found and modified from elsewhere on the web.
上面的代码是从网上其他地方找到并修改的。
Remove your capturing group: 删除您的捕获组:
ip = re.findall( r'[0-9]+(?:\.[0-9]+){3}', s )
Result: 结果:
['165.91.15.131']
Notes: 笔记:
0.00.999.9999
. 0.00.999.9999
。 This isn't necessarily a problem, but you should be aware of it and possibly handle this situation. +
to {1,3}
for a partial fix without making the regular expression overly complex. +
更改为{1,3}
以进行部分修复,而不会使正则表达式过于复杂。 You can use the following regex to capture only valid IP addresses 您可以使用以下正则表达式仅捕获有效的IP地址
re.findall(r'\b25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\.25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?\b',s)
returns 回报
['165', '91', '15', '131']
import re
ipPattern = re.compile('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')
findIP = re.findall(ipPattern,s)
findIP contains ['165.91.15.131']
easiest way to find the ip address from the log.. 从日志中找到IP地址的最简单方法..
s = "<html><head><title>Current IP Check</title></head><body>Current IP Address: 165.91.15.131</body></html>"
info = re.findall(r'[\d.-]+', s)
In [42]: info
在[42]中:info
Out[42]: ['165.91.15.131']
出[42]:['165.91.15.131']
You can use following regex to extract valid IP without following errors 您可以使用以下正则表达式来提取有效的IP而不会出现以下错误
1.Some detected 123.456.789.111
as valid IP 1.有些检测到
123.456.789.111
为有效IP
2.Some don't detect 127.0.00.1
as valid IP 2.有些不检测
127.0.00.1
为有效IP
3.Some don't detect IP that start with zero like 08.8.8.8
3.有些人不会像
08.8.8.8
那样检测以零开头的IP
So here I post a regex that works on all above conditions. 所以在这里我发布一个适用于所有上述条件的正则表达式。
Note : I have extracted more than 2 millions IP without any problem with following regex.
注意:我已经提取了超过2百万个IP而没有任何跟随正则表达式的问题。
(?:(?:1\d\d|2[0-5][0-5]|2[0-4]\d|0?[1-9]\d|0?0?\d)\.){3}(?:1\d\d|2[0-5][0-5]|2[0-4]\d|0?[1-9]\d|0?0?\d)
This is how I've done it. 这就是我做到的。 I think it's so clean
我觉得它太干净了
import re
import urllib2
def getIP():
ip_checker_url = "http://checkip.dyndns.org/"
address_regexp = re.compile ('\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}')
response = urllib2.urlopen(ip_checker_url).read()
result = address_regexp.search(response)
if result:
return result.group()
else:
return None
get_IP() returns ip into a string or None get_IP()将ip返回到字符串或None
You can substitute address_regexp for other regular expressions if you prefer a more accurate parsing or maybe change the web service provider. 如果您更喜欢更准确的解析或更改Web服务提供者,则可以将address_regexp替换为其他正则表达式。
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.