I have a file with several IP addresses. There are about 900 IPs on 4 lines of txt. I would like the output to be 1 IP per line. How can I accomplish this? Based on other code, I have come up wiht this, but it fails becasue multiple IPs are on single lines:
import sys
import re
try:
if sys.argv[1:]:
print "File: %s" % (sys.argv[1])
logfile = sys.argv[1]
else:
logfile = raw_input("Please enter a log file to parse, e.g /var/log/secure: ")
try:
file = open(logfile, "r")
ips = []
for text in file.readlines():
text = text.rstrip()
regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text)
if regex is not None and regex not in ips:
ips.append(regex)
for ip in ips:
outfile = open("/tmp/list.txt", "a")
addy = "".join(ip)
if addy is not '':
print "IP: %s" % (addy)
outfile.write(addy)
outfile.write("\n")
finally:
file.close()
outfile.close()
except IOError, (errno, strerror):
print "I/O Error(%s) : %s" % (errno, strerror)
The $
anchor in your expression is preventing you from finding anything but the last entry. Remove that, then use the list returned by .findall()
:
found = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})',text)
ips.extend(found)
re.findall()
will always return a list, which could be empty.
ipaddress.IPV4Address()
class .The findall function returns an array of matches, you aren't iterating through each match.
regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text)
if regex is not None:
for match in regex:
if match not in ips:
ips.append(match)
Extracting IP Addresses From File
I answered a similar question in this discussion . In short, it's a solution based on one of my ongoing projects for extracting Network and Host Based Indicators from different types of input data (eg string, file, blog posting, etc.): https://github.com/JohnnyWachter/intel
I would import the IPAddresses and Data classes, then use them to accomplish your task in the following manner:
#!/usr/bin/env/python
"""Extract IPv4 Addresses From Input File."""
from Data import CleanData # Format and Clean the Input Data.
from IPAddresses import ExtractIPs # Extract IPs From Input Data.
def get_ip_addresses(input_file_path):
""""
Read contents of input file and extract IPv4 Addresses.
:param iput_file_path: fully qualified path to input file. Expecting str
:returns: dictionary of IPv4 and IPv4-like Address lists
:rtype: dict
"""
input_data = [] # Empty list to house formatted input data.
input_data.extend(CleanData(input_file_path).to_list())
results = ExtractIPs(input_data).get_ipv4_results()
return results
Now that you have a dictionary of lists, you can easily access the data you want and output it in whatever way you desire. The below example makes use of the above function; printing the results to console, and writing them to a specified output file:
# Extract the desired data using the aforementioned function. ipv4_list = get_ip_addresses('/path/to/input/file') # Open your output file in 'append' mode. with open('/path/to/output/file', 'a') as outfile: # Ensure that the list of valid IPv4 Addresses is not empty. if ipv4_list['valid_ips']: for ip_address in ipv4_list['valid_ips']: # Print to console print(ip_address) # Write to output file. outfile.write(ip_address)
Without re.MULTILINE
flag $
matches only at the end of string.
To make debugging easier split the code into several parts that you could test independently.
def extract_ips(data):
return re.findall(r"\d{1,3}(?:\.\d{1,3}){3}", data)
the regex filters out some valid ips eg, 2130706433
, "1::1" .
And in reverse, the regex matches invalid strings eg, 999.999.999.999
. You could validate an ip string using socket.inet_aton()
or more general socket.inet_pton()
. You could even split the input into pieces without searching for ip and use these functions to keep valid ips.
If input file is small and you don't need to preserve original order of ips:
with open(filename) as infile, open(outfilename, "w") as outfile:
outfile.write("\n".join(set(extract_ips(infile.read()))))
Otherwise:
with open(filename) as infile, open(outfilename, "w") as outfile:
seen = set()
for line in infile:
for ip in extract_ips(line):
if ip not in seen:
seen.add(ip)
print >>outfile, ip
The technical post webpages of this site follow the CC BY-SA 4.0 protocol. If you need to reprint, please indicate the site URL or the original address.Any question please contact:yoyou2525@163.com.