[英]python parse file for ip addresses
I have a file with several IP addresses.我有一个包含多个 IP 地址的文件。 There are about 900 IPs on 4 lines of txt.
4行txt大约有900个IP。 I would like the output to be 1 IP per line.
我希望输出为每行 1 个 IP。 How can I accomplish this?
我怎样才能做到这一点? Based on other code, I have come up wiht this, but it fails becasue multiple IPs are on single lines:
基于其他代码,我想出了这个,但它失败了,因为多个 IP 位于单行上:
import sys
import re
try:
if sys.argv[1:]:
print "File: %s" % (sys.argv[1])
logfile = sys.argv[1]
else:
logfile = raw_input("Please enter a log file to parse, e.g /var/log/secure: ")
try:
file = open(logfile, "r")
ips = []
for text in file.readlines():
text = text.rstrip()
regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text)
if regex is not None and regex not in ips:
ips.append(regex)
for ip in ips:
outfile = open("/tmp/list.txt", "a")
addy = "".join(ip)
if addy is not '':
print "IP: %s" % (addy)
outfile.write(addy)
outfile.write("\n")
finally:
file.close()
outfile.close()
except IOError, (errno, strerror):
print "I/O Error(%s) : %s" % (errno, strerror)
The $
anchor in your expression is preventing you from finding anything but the last entry.表达式中的
$
锚点阻止您找到除最后一个条目之外的任何内容。 Remove that, then use the list returned by .findall()
:删除它,然后使用
.findall()
返回的列表:
found = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})',text)
ips.extend(found)
re.findall()
will always return a list, which could be empty. re.findall()
将始终返回一个列表,该列表可能为空。
ipaddress.IPV4Address()
class .ipaddress.IPV4Address()
类。The findall function returns an array of matches, you aren't iterating through each match. findall 函数返回一个匹配数组,您不会遍历每个匹配。
regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text)
if regex is not None:
for match in regex:
if match not in ips:
ips.append(match)
Extracting IP Addresses From File从文件中提取 IP 地址
I answered a similar question in this discussion .我在这个讨论中回答了一个类似的问题。 In short, it's a solution based on one of my ongoing projects for extracting Network and Host Based Indicators from different types of input data (eg string, file, blog posting, etc.): https://github.com/JohnnyWachter/intel
简而言之,这是一个基于我正在进行的项目之一的解决方案,用于从不同类型的输入数据(例如字符串、文件、博客帖子等)中提取基于网络和主机的指标: https : //github.com/JohnnyWachter/intel
I would import the IPAddresses and Data classes, then use them to accomplish your task in the following manner:我将导入IPAddresses和Data类,然后使用它们以下列方式完成您的任务:
#!/usr/bin/env/python
"""Extract IPv4 Addresses From Input File."""
from Data import CleanData # Format and Clean the Input Data.
from IPAddresses import ExtractIPs # Extract IPs From Input Data.
def get_ip_addresses(input_file_path):
""""
Read contents of input file and extract IPv4 Addresses.
:param iput_file_path: fully qualified path to input file. Expecting str
:returns: dictionary of IPv4 and IPv4-like Address lists
:rtype: dict
"""
input_data = [] # Empty list to house formatted input data.
input_data.extend(CleanData(input_file_path).to_list())
results = ExtractIPs(input_data).get_ipv4_results()
return results
Now that you have a dictionary of lists, you can easily access the data you want and output it in whatever way you desire.现在您有了一个列表字典,您可以轻松访问您想要的数据并以您想要的任何方式输出它。 The below example makes use of the above function;
下面的例子使用了上面的函数; printing the results to console, and writing them to a specified output file:
将结果打印到控制台,并将它们写入指定的输出文件:
# Extract the desired data using the aforementioned function. ipv4_list = get_ip_addresses('/path/to/input/file') # Open your output file in 'append' mode. with open('/path/to/output/file', 'a') as outfile: # Ensure that the list of valid IPv4 Addresses is not empty. if ipv4_list['valid_ips']: for ip_address in ipv4_list['valid_ips']: # Print to console print(ip_address) # Write to output file. outfile.write(ip_address)
Without re.MULTILINE
flag $
matches only at the end of string.没有
re.MULTILINE
标志$
仅在字符串的末尾匹配。
To make debugging easier split the code into several parts that you could test independently.为了使调试更容易,将代码分成几个可以独立测试的部分。
def extract_ips(data):
return re.findall(r"\d{1,3}(?:\.\d{1,3}){3}", data)
the regex filters out some valid ips eg, 2130706433
, "1::1" .正则表达式会过滤掉一些有效的
2130706433
,例如2130706433
、 "1::1" 。
And in reverse, the regex matches invalid strings eg, 999.999.999.999
.相反,正则表达式匹配无效字符串,例如
999.999.999.999
。 You could validate an ip string using socket.inet_aton()
or more general socket.inet_pton()
.您可以使用
socket.inet_aton()
或更通用的socket.inet_pton()
验证 ip 字符串。 You could even split the input into pieces without searching for ip and use these functions to keep valid ips.您甚至可以在不搜索 ip 的情况下将输入分成几部分,并使用这些函数来保持有效的 ip。
If input file is small and you don't need to preserve original order of ips:如果输入文件很小并且您不需要保留 ips 的原始顺序:
with open(filename) as infile, open(outfilename, "w") as outfile:
outfile.write("\n".join(set(extract_ips(infile.read()))))
Otherwise:否则:
with open(filename) as infile, open(outfilename, "w") as outfile:
seen = set()
for line in infile:
for ip in extract_ips(line):
if ip not in seen:
seen.add(ip)
print >>outfile, ip
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.