简体   繁体   English

正则表达式:如何从字符串中仅提取第一个IP地址(在Python中)

[英]Regex: how to extract only first IP address from string (in Python)

Given the following string (or similar strings, some of which may contain more than one IP address): 给定以下字符串(或类似的字符串,其中一些可能包含多个IP地址):

from mail2.oknotify2.com (mail2.oknotify2.com. [208.83.243.70]) by mx.google.com with ESMTP id dp5si2596299pdb.170.2015.06.03.14.12.03

I wish to extract the first and only the first IP address, in Python. 我希望提取Python中的第一个也是唯一的IP地址。 A first attempt with something like ([0-9]{2,}\\.){3}([0-9]{2,}){1} when tried out on nregex.com, looks almost OK, matching the IP address fine, but also matches the other substring which roughly resembles an IP address (170.2015.06.03.14.12.03). 在nregex.com上尝试使用([0-9]{2,}\\.){3}([0-9]{2,}){1}尝试时,看起来几乎可以,与IP地址正常,但也匹配与IP地址(170.2015.06.03.14.12.03)大致相似的其他子字符串。 When the same pattern is passed to re.compile/re.findall though, the result is: 当将相同的模式传递给re.compile/re.findall ,结果是:

[(u'243.', u'70'), (u'06.', u'03')]

So clearly the regex is no good. 因此很明显,正则表达式不好。 How can I improve it so that it's neater and catches all IPV4 address, and how can I make it such that it only matches the first? 我如何改进它,使其更整洁并捕获所有IPV4地址,如何使它仅与第一个匹配?

Many thanks. 非常感谢。

Use re.search with the following pattern: re.search使用以下模式:

>>> s = 'from mail2.oknotify2.com (mail2.oknotify2.com. [208.83.243.70]) by mx.google.com with ESMTP id dp5si2596299pdb.170.2015.06.03.14.12.03'
>>> import re
>>> re.search(r'\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}', s).group()
'208.83.243.70'

The regex you want is r'(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})' . 您想要的正则表达式为r'(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})' This catches 4 1- to 4-digit numbers separated by dots. 捕获由点分隔的4个1到4位数字。

If the IP number always comes before other numbers in the string, you can avoid selecting it by using a non-greedy function such as re.find . 如果IP号码始终位于字符串中的其他号码之前,则可以避免使用非贪心函数(例如re.find来选择它。 In contrast, re.findall will catch both 208.83.243.70 and 015.06.03.14 . 相反, re.findall将同时捕获208.83.243.70015.06.03.14

Are you OK with using the brackets to single out the IP number? 您可以使用方括号来选择IP地址吗? if so, you can change the regex to r'\\[(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\]' . 如果是这样,则可以将正则表达式更改为r'\\[(\\d{1,3}\\.\\d{1,3}\\.\\d{1,3}\\.\\d{1,3})\\]' It would be safer that way. 这样会更安全。

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM