简体   繁体   English

如何仅将匹配的正则表达式写入 Python 中的新文件?

[英]How do I write only the matching regex to a new file in Python?

My goal is to extract the IP Addresses only and append them to a new file.我的目标是仅将 IP 地址和 append 地址提取到新文件中。 The file I have is called error_log.txt and has lines such as:我拥有的文件名为 error_log.txt ,其中包含以下行:

[Sun Jun 7 16:45:56 2020] [info] [client 64.242.88.10] (104)Connection reset by peer: client stopped connection before send body completed [Sun Jun 7 16:45:56 2020] [info] [client 64.242.88.10] (104)Connection reset by peer:客户端在发送正文完成之前停止连接

[Sun Jun 7 17:13:50 2020] [info] [client 64.242.88.10] (104)Connection reset by peer: client stopped connection before send body completed [Sun Jun 7 17:13:50 2020] [info] [client 64.242.88.10] (104)Connection reset by peer:客户端在发送正文完成前停止连接

The goal is to write "64.242.88.10" and the rest of the IPs to a new file.目标是将 IP 的“64.242.88.10”和 rest 写入新文件。

I can get the print function to give me only the IPs, but when it writes to the file 'ip_only.txt' it prints the complete line from the error log.我可以打印 function 来只给我 IP,但是当它写入文件 'ip_only.txt' 时,它会打印错误日志中的完整行。

How can I just get the IPs only to the new file (in a column)?我怎样才能只获取新文件的 IP(在一列中)?

Bonus, when it does print when testing, it gives me the blank lines too.奖励,当它在测试时打印时,它也给了我空白行。 How can I omit those lines?我怎样才能省略这些行?

import re

with open('error_log.txt', 'r') as file:
    fi = file.readlines()

ip_only = open('ip_only.txt', 'w+')

re_ip = re.compile("\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}")

for line in fi:
    ip = re.findall(re_ip, line)
    ip_only.write(str(line))
    # print(ip)

You need to write the ip variable to the file instead of line which contains the original line:您需要将ip变量写入文件而不是包含原始line的行:

for line in fi:
    ip = re.findall(re_ip, line)
    ip_only.write(str(ip))

# ip_only.txt:
# ['64.242.88.10']['64.242.88.10']

Additionally, to remove the brackets and quotes from your output (note that re.findall() returns a list of strings) and print each IP address to a new line:此外,要从 output 中删除括号和引号(注意re.findall()返回字符串列表)并将每个 IP 地址打印到新行:

for line in fi:
    ips = re.findall(re_ip, line)
    for ip in ips:
        ip_only.write(ip + '\n')

# ip_only.txt:
# 64.242.88.10
# 64.242.88.10
  1. While writing into file, you are writing the whole line.在写入文件时,您正在写入整行。 instead write only the IPs as below ip_only.write(str(ip))而是只写下面的IP ip_only.write(str(ip))

  2. To avoid blank lines, you can have a if condition to check, whether the ip is found or not in the given line.为避免出现空行,您可以使用 if 条件来检查 ip 是否在给定行中找到。

   for line in fi:
       ip = re.findall(re_ip, line)
       if ip:
           ip_only.write(str(ip))

If print(ip) gives you expected result then you should use write(ip) instead of write(line)如果print(ip)给你预期的结果,那么你应该使用write(ip)而不是write(line)

regex gives list so you may need to write only ip[0] .正则表达式给出列表,因此您可能只需要编写ip[0] And you need to add \n to move to the next line.您需要添加\n才能移动到下一行。

        ip_only.write(ip[0] + "\n")

As for empty line - first remove all spaces, tabs, enters and next compare with empty string "" .至于空行 - 首先删除所有空格、制表符、回车,然后与空字符串""进行比较。 OR use fact that empty string gives False when used in if/else或者使用空字符串在if/else中使用时给出False的事实

    line = line.strip()
    if line:
         # ... code ...

import re

fi = [
    '[Sun Jun 7 16:45:56 2020] [info] [client 64.242.88.10] (104)Connection reset by peer: client stopped connection before send body completed',
    '[Sun Jun 7 17:13:50 2020] [info] [client 64.242.88.10] (104)Connection reset by peer: client stopped connection before send body completed',
]    


ip_only = open('ip_only.txt', 'w+')

re_ip = re.compile("\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}")

for line in fi:
    line = line.strip()
    if line:
        ip = re.findall(re_ip, line)
        ip_only.write(ip[0] + "\n")
        print(ip[0])

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM