Python 正则表达式替换使用字典清理域名

Question

For the output, need to replace the brackets contain a digits with periods '.'.对于 output，需要替换括号中包含带句点“.”的数字。 Also remove the brackets at the beginning and end of the domain.还要删除域开头和结尾的括号。

Can we use re.sub for this and if so how?我们可以为此使用re.sub吗？如果可以，如何使用？

code代码

import re

log = ["4/19/2020 11:59:09 PM 2604 PACKET  0000014DE1921330 UDP Rcv 192.168.1.28   f975   Q [0001   D   NOERROR] A      (7)pagead2(17)googlesyndication(3)com(0)",
       "4/19/2020 11:59:09 PM 0574 PACKET  0000014DE18C4720 UDP R cv 192.168.2.54    9c63   Q [0001   D   NOERROR] A      (2)pg(3)cdn(5)viber(3)com(0)"]

rx_dict = { 'query': re.compile(r'(?P<query>[\S]*)$') }

for item in log:
    for key, r_exp in rx_dict.items():
        print(f"{r_exp.search(item).group(1)}")

output output

(7)pagead2(17)googlesyndication(3)com(0)
(2)pg(3)cdn(5)viber(3)com(0)

preferred output首选output

pagead2.googlesyndication.com
pg.cdn.viber.com

Answer 1

Pragmatic python usage:实用 python 用法：

log = ["4/19/2020 11:59:09 PM 2604 PACKET  0000014DE1921330 UDP Rcv 192.168.1.28   f975   Q [0001   D   NOERROR] A      (7)pagead2(17)googlesyndication(3)com(0)",
       "4/19/2020 11:59:09 PM 0574 PACKET  0000014DE18C4720 UDP R cv 192.168.2.54    9c63   Q [0001   D   NOERROR] A      (2)pg(3)cdn(5)viber(3)com(0)"]

import re

urls = [re.sub(r'\(\d+\)','.',t.split()[-1]).strip('.') for t in log]

print (urls)

Output: Output：

['pagead2.googlesyndication.com', 'pg.cdn.viber.com']

Dictionary refinement via rules:通过规则细化字典：

If you want to apply consecutive rules via a dictionary, go lambda all the way:如果你想通过字典应用连续的规则，go lambda一路：

import re 

rules = {"r0": lambda x: x.split()[-1],
         "r1": lambda x: re.sub(r'\(\d+\)','.',x),
         "r2": lambda x: x.strip(".")}

result = []
for value in log:  
    result.append(value)
    for r in rules:
        result[-1] = rules[r](result[-1])

print(result)

Output: Output：

['pagead2.googlesyndication.com', 'pg.cdn.viber.com']

Answer 2

Yes, you can use re.sub .是的，您可以使用re.sub 。 I assume you have this dictionary so you can extract multiple pieces from the log.我假设您有这本字典，因此您可以从日志中提取多个部分。 You can do something like this for dispatch :你可以为dispatch做这样的事情：

ops = {
    "query": lambda e: (
         re.sub(r"\(\d+\(", ".", (
             re.search(r"(?P<query>[\S]*)$", e).group(1),
         )
     ),
     ...
}

And then apply the functions to all log entires然后将这些函数应用于所有日志条目

log_results = {op_name: op(l) for op_name, op in ops.items() for l in log}

Python 正则表达式替换使用字典清理域名

问题描述

2 个解决方案

解决方案1
1 已采纳 2020-04-21 05:09:57

解决方案2
0 2020-04-21 05:12:30

Python 正则表达式替换使用字典清理域名

问题描述

2 个解决方案

解决方案1 1 已采纳 2020-04-21 05:09:57

解决方案2 0 2020-04-21 05:12:30

解决方案1
1 已采纳 2020-04-21 05:09:57

解决方案2
0 2020-04-21 05:12:30