Python - 從字符串解析IPv4地址（即使在審查時）

Question

目標：編寫Python 2.7代碼以從字符串中提取IPv4地址。

字符串內容示例：

以下是IP地址：192.168.1.1,8.8.8.8,101.099.098.000。 這些也可以顯示為192.168.1 [。] 1或192.168.1（。）1或192.168.1 [dot] 1或192.168.1（dot）1或192 .168 .1 .1或192. 168. 1這些審查方法適用於任何一個點（例如：192 [。] 168 [。] 1 [。] 1）。

正如您從上面所看到的，我正在努力尋找一種解析txt文件的方法，該文件可能包含以多種形式的“審查”（以防止超鏈接）描述的IP。

我認為正則表達式是要走的路。 也許會說些什么; 由“分隔符列表”中的任何內容分隔的四個整數0-255或000-255的任何分組，其中包括句點，括號，括號或任何其他上述示例。 這樣，可以根據需要更新“分隔符列表”。

不確定這是否是正確的方式，甚至可能，所以，非常感謝任何幫助。

更新：感謝下面的遞歸回答，我現在有以下代碼適用於上面的示例。 它會...

找到IP
將它們放入列表中
清理他們的空間/大括號/等
並使用已清理的列表條目替換未清除的列表條目。

警告：下面的代碼不考慮不正確/無效的IP，例如192.168.0.256或192.168.1.2.3目前，它將從上述中刪除尾隨的6和3。 如果它的第一個八位字節無效（例如：256.10.10.10），它將丟棄前導2（產生56.10.10.10）。

import re

def extractIPs(fileContent):
    pattern = r"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)([ (\[]?(\.|dot)[ )\]]?(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})"
    ips = [each[0] for each in re.findall(pattern, fileContent)]   
    for item in ips:
        location = ips.index(item)
        ip = re.sub("[ ()\[\]]", "", item)
        ip = re.sub("dot", ".", ip)
        ips.remove(item)
        ips.insert(location, ip) 
    return ips

myFile = open('***INSERT FILE PATH HERE***')
fileContent = myFile.read()

IPs = extractIPs(fileContent)
print "Original file content:\n{0}".format(fileContent)
print "--------------------------------"
print "Parsed results:\n{0}".format(IPs)

Answer 1

這是一個有效的正則表達式：

import re
pattern = r"((([01]?[0-9]?[0-9]|2[0-4][0-9]|25[0-5])[ (\[]?(\.|dot)[ )\]]?){3}([01]?[0-9]?[0-9]|2[0-4][0-9]|25[0-5]))"
text = "The following are IP addresses: 192.168.1.1, 8.8.8.8, 101.099.098.000. These can also appear as 192.168.1[.]1 or 192.168.1(.)1 or 192.168.1[dot]1 or 192.168.1(dot)1 or 192 .168 .1 .1 or 192. 168. 1. 1. "
ips = [match[0] for match in re.findall(pattern, text)]
print ips

# output: ['192.168.1.1', '8.8.8.8', '101.099.098.000', '192.168.1[.]1', '192.168.1(.)1', '192.168.1[dot]1', '192.168.1(dot)1', '192 .168 .1 .1', '192. 168. 1. 1']

正則表達式有幾個主要部分，我將在這里解釋：

([01]?[0-9]?[0-9]|2[0-4][0-9]|25[0-5])
這匹配ip地址的數字部分。 | 意思是“或”。 第一種情況處理從0到199的數字，帶或不帶前導零。 后兩個案件處理的數字超過199。
[ (\\[]?(\\.|dot)[ )\\]]?
這與“點”部分匹配。 有三個子組件：
- [ (\\[]?的“前綴”。空格，開放式或開放式方括號。尾部?表示此部分是可選的。
- (\\.|dot) “點”或句點。
- [ )\\]]? “后綴”。 與前綴相同的邏輯。
{3}表示重復上一個組件3次。
最后一個元素是另一個數字，它與第一個元素相同，除了它后面沒有一個點。

Answer 2

描述

這個正則表達式將匹配看起來像IP地址的四個八位字節中的每一個。 每個八位字節都將放入其自己的捕獲組中進行收集。

(2[0-4][0-9]|[01]?[0-9]?[0-9]|25[0-5])\\D{1,5}(2[0-4][0-9]|[01]?[0-9]?[0-9]|25[0-5])\\D{1,5}(2[0-4][0-9]|[01]?[0-9]?[0-9]|25[0-5])\\D{1,5}(2[0-4][0-9]|[01]?[0-9]?[0-9]|25[0-5])

在此輸入圖像描述

鑒於以下示例文本，此正則表達式將完整匹配所有10個嵌入式IP字符串，包括第一個。 工作示例： http ： //www.rubular.com/r/1MbGZOhuj5

The following are IP addresses: 192.168.1.222, 8.8.8.8, 101.099.098.000. These can also appear as 192.168.1[.]1 or 192.168.1(.)1 or 192.168.1[dot]1 or 192.168.1(dot)1 or 192 .168 .1 .1 or 192. 168. 1. 1. and these censorship methods could apply to any of the dots (Ex: 192[.]168[.]1[.]1).

可以迭代生成的匹配，並且可以通過將4個捕獲組與點連接來構造格式正確的IP字符串。

Answer 3

下面的代碼將......

即使在審查時也會在字符串中查找IP（例如：192.168.1 [dot] 20或10.10.10 .21）
將它們放入列表中
清理他們的審查（空格/大括號/括號）
並使用已清理的列表條目替換未清除的列表條目。

警告：下面的代碼不考慮不正確/無效的IP，例如192.168.0.256或192.168.1.2.3目前，它將丟棄尾隨數字（前面提到的6和3）。 如果它的第一個八位字節無效（例如：256.10.10.10），它將丟棄前導數字（產生56.10.10.10）。


import re

def extractIPs(fileContent):
    pattern = r"((25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)([ (\[]?(\.|dot)[ )\]]?(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)){3})"
    ips = [each[0] for each in re.findall(pattern, fileContent)]   
    for item in ips:
        location = ips.index(item)
        ip = re.sub("[ ()\[\]]", "", item)
        ip = re.sub("dot", ".", ip)
        ips.remove(item)
        ips.insert(location, ip) 
    return ips


myFile = open('***INSERT FILE PATH HERE***')
fileContent = myFile.read()

IPs = extractIPs(fileContent)
print "Original file content:\n{0}".format(fileContent)
print "--------------------------------"
print "Parsed results:\n{0}".format(IPs)

Answer 4

提取和分類IPv4地址（即使在截尾時）

注意：這只是我為提取IPv4地址而編寫的類的實現。 我可能會在將來用這種功能的方法更新我的課程。 你可以在我的GitHub頁面上找到它。

我在下面演示的內容如下：

清理字符串內容示例
將您的字符串數據放入列表中
使用ExtractIPs（）類來解析和分類IPv4地址
- 該類返回包含4個列表的字典：
  - 有效的IPv4地址
  - 公共IPv4地址
  - 私有IPv4地址
  - 無效的IPv4地址

ExtractIPs類

 #!/usr/bin/env python """Extract and Classify IP Addresses.""" import re # Use Regular Expressions. __program__ = "IPAddresses.py" __author__ = "Johnny C. Wachter" __copyright__ = "Copyright (C) 2014 Johnny C. Wachter" __license__ = "MIT" __version__ = "0.0.1" __maintainer__ = "Johnny C. Wachter" __contact__ = "wachter.johnny@gmail.com" __status__ = "Development" class ExtractIPs(object): """Extract and Classify IP Addresses From Input Data.""" def __init__(self, input_data): """Instantiate the Class.""" self.input_data = input_data self.ipv4_results = { 'valid_ips': [], # Store all valid IP Addresses. 'invalid_ips': [], # Store all invalid IP Addresses. 'private_ips': [], # Store all Private IP Addresses. 'public_ips': [] # Store all Public IP Addresses. } def extract_ipv4_like(self): """Extract IP-like strings from input data. :rtype : list """ ipv4_like_list = [] ip_like_pattern = re.compile(r'([0-9]{1,3}\\.){3}([0-9]{1,3})') for entry in self.input_data: if re.match(ip_like_pattern, entry): if len(entry.split('.')) == 4: ipv4_like_list.append(entry) return ipv4_like_list def validate_ipv4_like(self): """Validate that IP-like entries fall within the appropriate range.""" if self.extract_ipv4_like(): # We're gonna want to ignore the below two addresses. ignore_list = ['0.0.0.0', '255.255.255.255'] # Separate the Valid from Invalid IP Addresses. for ipv4_like in self.extract_ipv4_like(): # Split the 'IP' into parts so each part can be validated. parts = ipv4_like.split('.') # All part values should be between 0 and 255. if all(0 <= int(part) < 256 for part in parts): if not ipv4_like in ignore_list: self.ipv4_results['valid_ips'].append(ipv4_like) else: self.ipv4_results['invalid_ips'].append(ipv4_like) else: pass def classify_ipv4_addresses(self): """Classify Valid IP Addresses.""" if self.ipv4_results['valid_ips']: # Now we will classify the Valid IP Addresses. for valid_ip in self.ipv4_results['valid_ips']: private_ip_pattern = re.findall( r"""^10\\.(\\d{1,3}\\.){2}\\d{1,3} (^127\\.0\\.0\\.1)| # Loopback (^10\\.(\\d{1,3}\\.){2}\\d{1,3})| # 10/8 Range # Matching the 172.16/12 Range takes several matches (^172\\.1[6-9]\\.\\d{1,3}\\.\\d{1,3})| (^172\\.2[0-9]\\.\\d{1,3}\\.\\d{1,3})| (^172\\.3[0-1]\\.\\d{1,3}\\.\\d{1,3})| (^192\\.168\\.\\d{1,3}\\.\\d{1,3})| # 192.168/16 Range # Match APIPA Range. (^169\\.254\\.\\d{1,3}\\.\\d{1,3}) # VERBOSE for a clean look of this RegEx. """, valid_ip, re.VERBOSE ) if private_ip_pattern: self.ipv4_results['private_ips'].append(valid_ip) else: self.ipv4_results['public_ips'].append(valid_ip) else: pass def get_ipv4_results(self): """Extract and classify all valid and invalid IP-like strings. :returns : dict """ self.extract_ipv4_like() self.validate_ipv4_like() self.classify_ipv4_addresses() return self.ipv4_results

審查提取示例

 censored = re.compile( r""" \\(\\.\\)| \\(dot\\)| \\[\\.\\]| \\[dot\\]| ( \\.) """, re.VERBOSE | re.IGNORECASE ) data_list = input_string.split() # Bring your input string to a list. clean_list = [] # List to store the cleaned up input. for entry in data_list: # Remove undesired leading and trailing characters. clean_entry = entry.strip(' .,<>?/[]\\\\{}"\\'|`~!@#$%^&*()_+-=') clean_list.append(clean_entry) # Add the entry to the clean list. clean_unique_list = list(set(clean_list)) # Remove duplicates in list. # Now we can go ahead and extract IPv4 Addresses. Note that this will be a dict. results = ExtractIPs(clean_list).get_ipv4_results() for k, v in results.iteritems(): # After all that work, make sure the results are nicely presented! print("\\n%s: %s" % (k, v))

結果：

 public_ips: ['8.8.8.8', '101.099.098.000'] valid_ips: ['192.168.1.1', '8.8.8.8', '101.099.098.000'] invalid_ips: [] private_ips: ['192.168.1.1']

Python - 從字符串解析IPv4地址（即使在審查時）

問題描述

4 個解決方案

解決方案1
8 2013-06-26 19:01:12

解決方案2
3 2013-06-27 05:06:06

描述

解決方案3
0 已采納 2013-06-30 08:19:12

解決方案4
0 2014-06-09 04:36:45

Python - 從字符串解析IPv4地址（即使在審查時）

問題描述

4 個解決方案

解決方案1 8 2013-06-26 19:01:12

解決方案2 3 2013-06-27 05:06:06

描述

解決方案3 0 已采納 2013-06-30 08:19:12

解決方案4 0 2014-06-09 04:36:45

解決方案1
8 2013-06-26 19:01:12

解決方案2
3 2013-06-27 05:06:06

解決方案3
0 已采納 2013-06-30 08:19:12

解決方案4
0 2014-06-09 04:36:45