简体   繁体   English

ip地址的python解析文件

[英]python parse file for ip addresses

I have a file with several IP addresses.我有一个包含多个 IP 地址的文件。 There are about 900 IPs on 4 lines of txt. 4行txt大约有900个IP。 I would like the output to be 1 IP per line.我希望输出为每行 1 个 IP。 How can I accomplish this?我怎样才能做到这一点? Based on other code, I have come up wiht this, but it fails becasue multiple IPs are on single lines:基于其他代码,我想出了这个,但它失败了,因为多个 IP 位于单行上:

import sys
import re

try:
    if sys.argv[1:]:
        print "File: %s" % (sys.argv[1])
        logfile = sys.argv[1]
    else:
        logfile = raw_input("Please enter a log file to parse, e.g /var/log/secure: ")
    try:
        file = open(logfile, "r")
        ips = []
        for text in file.readlines():
           text = text.rstrip()
           regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text)
           if regex is not None and regex not in ips:
               ips.append(regex)

        for ip in ips:
           outfile = open("/tmp/list.txt", "a")
           addy = "".join(ip)
           if addy is not '':
              print "IP: %s" % (addy)
              outfile.write(addy)
              outfile.write("\n")
    finally:
        file.close()
        outfile.close()
except IOError, (errno, strerror):
        print "I/O Error(%s) : %s" % (errno, strerror)

The $ anchor in your expression is preventing you from finding anything but the last entry.表达式中的$锚点阻止您找到除最后一个条目之外的任何内容。 Remove that, then use the list returned by .findall() :删除它,然后使用.findall()返回的列表:

found = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})',text)
ips.extend(found)

re.findall() will always return a list, which could be empty. re.findall()将始终返回一个列表,该列表可能为空。

  • if you only want unique addresses, use a set instead of a list.如果您只想要唯一的地址,请使用集合而不是列表。
  • If you need to validate IP addresses (including ignoring private-use networks and local addresses), consider using the ipaddress.IPV4Address() class .如果您需要验证 IP 地址(包括忽略专用网络和本地地址),请考虑使用ipaddress.IPV4Address()

The findall function returns an array of matches, you aren't iterating through each match. findall 函数返回一个匹配数组,您不会遍历每个匹配。

regex = re.findall(r'(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})\.(?:[\d]{1,3})$',text)
if regex is not None:
    for match in regex:
        if match not in ips:
            ips.append(match)

Extracting IP Addresses From File从文件中提取 IP 地址

I answered a similar question in this discussion .我在这个讨论中回答了一个类似的问题。 In short, it's a solution based on one of my ongoing projects for extracting Network and Host Based Indicators from different types of input data (eg string, file, blog posting, etc.): https://github.com/JohnnyWachter/intel简而言之,这是一个基于我正在进行的项目之一的解决方案,用于从不同类型的输入数据(例如字符串、文件、博客帖子等)中提取基于网络和主机的指标: https : //github.com/JohnnyWachter/intel


I would import the IPAddresses and Data classes, then use them to accomplish your task in the following manner:我将导入IPAddressesData类,然后使用它们以下列方式完成您的任务:

#!/usr/bin/env/python

"""Extract IPv4 Addresses From Input File."""

from Data import CleanData  # Format and Clean the Input Data.
from IPAddresses import ExtractIPs  # Extract IPs From Input Data.


def get_ip_addresses(input_file_path):
    """"
    Read contents of input file and extract IPv4 Addresses.
    :param iput_file_path: fully qualified path to input file. Expecting str
    :returns: dictionary of IPv4 and IPv4-like Address lists
    :rtype: dict
    """

    input_data = []  # Empty list to house formatted input data.

    input_data.extend(CleanData(input_file_path).to_list())

    results = ExtractIPs(input_data).get_ipv4_results()

    return results
  • Now that you have a dictionary of lists, you can easily access the data you want and output it in whatever way you desire.现在您有了一个列表字典,您可以轻松访问您想要的数据并以您想要的任何方式输出它。 The below example makes use of the above function;下面的例子使用了上面的函数; printing the results to console, and writing them to a specified output file:将结果打印到控制台,并将它们写入指定的输出文件:

     # Extract the desired data using the aforementioned function. ipv4_list = get_ip_addresses('/path/to/input/file') # Open your output file in 'append' mode. with open('/path/to/output/file', 'a') as outfile: # Ensure that the list of valid IPv4 Addresses is not empty. if ipv4_list['valid_ips']: for ip_address in ipv4_list['valid_ips']: # Print to console print(ip_address) # Write to output file. outfile.write(ip_address)

Without re.MULTILINE flag $ matches only at the end of string.没有re.MULTILINE标志$仅在字符串的末尾匹配。

To make debugging easier split the code into several parts that you could test independently.为了使调试更容易,将代码分成几个可以独立测试的部分。

def extract_ips(data):
    return re.findall(r"\d{1,3}(?:\.\d{1,3}){3}", data)

If input file is small and you don't need to preserve original order of ips:如果输入文件很小并且您不需要保留 ips 的原始顺序:

with open(filename) as infile, open(outfilename, "w") as outfile:
    outfile.write("\n".join(set(extract_ips(infile.read()))))

Otherwise:否则:

with open(filename) as infile, open(outfilename, "w") as outfile:
    seen = set()
    for line in infile:
        for ip in extract_ips(line):
            if ip not in seen:
               seen.add(ip)
               print >>outfile, ip

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM