简体   繁体   English

从文件中提取IP地址

[英]Extracting IP addresses from a file

I'm trying to extract IP addresses from an asp file in Python, the file looks something like this: 我正在尝试从Python中的asp文件提取IP地址,该文件看起来像这样:

onInternalNet = (
        isInNet(hostDNS, "147.163.1.0", "255.255.0.0") ||
        isInNet(hostDNS, "123.264.0.0", "255.255.0.0") ||
        isInNet(hostDNS, "137.5.0.0", "255.0.0.0") ||
        isInNet(hostDNS, "100.01.02.0", "255.0.0.0") ||
        isInNet(hostDNS, "172.146.30.0", "255.240.0.0") ||
        isInNet(hostDNS, "112.268.0.0", "255.255.0.0") ||

How I'm attempting to extract them is with a regex: 我试图提取它们的方法是使用正则表达式:

if re.compile(r"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$", line):

However I'm getting an error: 但是我遇到一个错误:

Traceback (most recent call last):
  File "pull_proxy.py", line 27, in <module>
    write_to_file(extract_proxies(in_file), out_file)
  File "pull_proxy.py", line 8, in extract_proxies
    if re.compile(r"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$", line):
  File "C:\Python27\lib\re.py", line 194, in compile
    return _compile(pattern, flags)
  File "C:\Python27\lib\re.py", line 233, in _compile
    bypass_cache = flags & DEBUG
TypeError: unsupported operand type(s) for &: 'str' and 'int'

I don't understand why I'm getting that error, what can I do to this code to make it extract the information like I want it to? 我不明白为什么会收到该错误,该如何处理该代码以使其像我希望的那样提取信息?

import re

def extract_proxies(in_file):
    buffer = []

    for line in in_file:
        if re.compile(r"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$", line):
            print "{} appened to buffer.".format(line)
            buffer.append(line)
        else:
            pass

    return buffer

def write_to_file(buffer, out_file):
    for proxy in buffer:
        with open(out_file, "a+") as res:
            res.write(proxy)

if __name__ == '__main__':
    print "Running...."
    in_file = "C:/Users/thomas_j_perkins/Downloads/test.asp"
    out_file = "c:/users/thomas_j_perkins/Downloads/results.txt"
    write_to_file(extract_proxies(in_file), out_file)

EDIT 编辑

Realized I hadn't opened the file: 意识到我没有打开文件:

import re

def extract_proxies(in_file):
    buffer = []

    for line in in_file:
        if re.compile(r"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$", line):
            print "{} appened to buffer.".format(line)
            buffer.append(line)
        else:
            pass

    in_file.close()
    return buffer

def write_to_file(buffer, out_file):
    for proxy in buffer:
        with open(out_file, "a+") as res:
            res.write(proxy)

if __name__ == '__main__':
    print "Running...."
    in_file = "C:/Users/thomas_j_perkins/Downloads/PAC-Global-Vista.asp"
    out_file = "c:/users/thomas_j_perkins/Downloads/results.txt"
    write_to_file(extract_proxies(open(in_file, "r+")), out_file)

Still getting the same error: 仍然出现相同的错误:

Running....
Traceback (most recent call last):
  File "pull_proxy.py", line 28, in <module>
    write_to_file(extract_proxies(open(in_file)), out_file)
  File "pull_proxy.py", line 8, in extract_proxies
    if re.compile(r"^\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}$", line):
  File "C:\Python27\lib\re.py", line 194, in compile
    return _compile(pattern, flags)
  File "C:\Python27\lib\re.py", line 233, in _compile
    bypass_cache = flags & DEBUG
TypeError: unsupported operand type(s) for &: 'str' and 'int'

re.compile was expecting an appropriate flags parameter (an integer) of which line (a string) is not. re.compile期望没有line (字符串)的适当的flags参数(整数)。

You should be doing re.match not re.compile : 你应该做re.match而不是re.compile

re.compile

Compile a regular expression pattern into a regular expression object, which can be used for matching using its match() and search() methods... 将正则表达式模式编译为正则表达式对象,可使用其match()search()方法进行匹配...

Your initial error 您的最初错误

TypeError: unsupported operand type(s) for &: 'str' and 'int'

is caused by exactly what @Moses said in his answer. 正是@Moses在他的回答中所说的。 flags are supposed to be int values, not strings. 标志应该是int值,而不是字符串。


You should compile your regex once. 您应该编译一次正则表达式。 Also, you need to use an open file handle when you iterate over the lines. 同样,遍历行时需要使用打开的文件句柄。

import re 汇入

IP_MATCHER = re.compile(r"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})")

def extract_proxies(fh):
    for line in fh:
        line = line.strip()
        match = IP_MATCHER.findall(line)
        if match:
            print "{} appened to buffer.".format(line)
            print match
        else:
            pass



def write_to_file(buffer, out_file):
    for proxy in buffer:
        with open(out_file, "a+") as res:
            res.write(proxy)


if __name__ == '__main__':
    print "Running...."
    in_file = "in.txt"
    with open(in_file) as fh:
        extract_proxies(fh)

This will find all matches, if you only want the first, then use IP_MATCHER.search and match.groups() . 如果只想要第一个,它将找到所有匹配项,然后使用IP_MATCHER.searchmatch.groups() This is of course assuming you actually want to extract the IP addresses. 当然,这是假设您实际上要提取IP地址。

For instance: 例如:

def extract_proxies(fh):
    for line in fh:
        line = line.strip()
        match = IP_MATCHER.findall(line)
        if len(match) == 2:
            print "{} appened to buffer.".format(line)
            ip, mask = match
            print "IP: %s => Mask: %s" % (ip, mask)
        else:
            pass

Please check the below code: 请检查以下代码:

Did couple of changes 做过几次改动

  1. re.compile - Regex should be complied first and then can be used with 'match/search/findall'. re.compile-首先应编译正则表达式,然后才能与“ match / search / findall”一起使用。
  2. Regex was not proper. 正则表达式不正确。 While writing regex we need to consider from the start of line. 在编写正则表达式时,我们需要从一开始就考虑。 Regex didn't match words in between line directly. 正则表达式没有直接匹配行之间的单词。

 import re


    def extract_proxies(in_file):
        buffer1 = []
        #Regex compiled here
        m = re.compile(r'\s*\w+\(\w+,\s+\"(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})\"')

        for line in in_file:
            #Used here to match
            r = m.match(line)
            if r is not None:
                print "{} appened to buffer.".format(line)
                buffer1.append(r.group(1))
            else:
                pass

        in_file.close()
        return buffer1


    def write_to_file(buffer1, out_file):
        for proxy in buffer1:
            with open(out_file, "a+") as res:
                res.write(proxy+'\n')


    if __name__ == '__main__':
        print "Running...."
        in_file = "sample.txt"
        out_file = "results.txt"
        write_to_file(extract_proxies(open(in_file)), out_file)

Output: 输出:

C:\Users\dinesh_pundkar\Desktop>python c.py
Running....
        isInNet(hostDNS, "147.163.1.0", "255.255.0.0") ||
 appened to buffer.
        isInNet(hostDNS, "123.264.0.0", "255.255.0.0") ||
 appened to buffer.
        isInNet(hostDNS, "137.5.0.0", "255.0.0.0") ||
 appened to buffer.
        isInNet(hostDNS, "100.01.02.0", "255.0.0.0") ||
 appened to buffer.
        isInNet(hostDNS, "172.146.30.0", "255.240.0.0") ||
 appened to buffer.
        isInNet(hostDNS, "112.268.0.0", "255.255.0.0") || appened to buffer.

C:\Users\dinesh_pundkar\Desktop>python c.py

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM