[英]Python regex substitution using a dictionary to clean up domain names
For the output, need to replace the brackets contain a digits with periods '.'.对于 output,需要替换括号中包含带句点“.”的数字。 Also remove the brackets at the beginning and end of the domain.还要删除域开头和结尾的括号。
Can we use re.sub for this and if so how?我们可以为此使用re.sub吗?如果可以,如何使用?
code代码
import re
log = ["4/19/2020 11:59:09 PM 2604 PACKET 0000014DE1921330 UDP Rcv 192.168.1.28 f975 Q [0001 D NOERROR] A (7)pagead2(17)googlesyndication(3)com(0)",
"4/19/2020 11:59:09 PM 0574 PACKET 0000014DE18C4720 UDP R cv 192.168.2.54 9c63 Q [0001 D NOERROR] A (2)pg(3)cdn(5)viber(3)com(0)"]
rx_dict = { 'query': re.compile(r'(?P<query>[\S]*)$') }
for item in log:
for key, r_exp in rx_dict.items():
print(f"{r_exp.search(item).group(1)}")
output output
(7)pagead2(17)googlesyndication(3)com(0)
(2)pg(3)cdn(5)viber(3)com(0)
preferred output首选output
pagead2.googlesyndication.com
pg.cdn.viber.com
Pragmatic python usage:实用 python 用法:
log = ["4/19/2020 11:59:09 PM 2604 PACKET 0000014DE1921330 UDP Rcv 192.168.1.28 f975 Q [0001 D NOERROR] A (7)pagead2(17)googlesyndication(3)com(0)",
"4/19/2020 11:59:09 PM 0574 PACKET 0000014DE18C4720 UDP R cv 192.168.2.54 9c63 Q [0001 D NOERROR] A (2)pg(3)cdn(5)viber(3)com(0)"]
import re
urls = [re.sub(r'\(\d+\)','.',t.split()[-1]).strip('.') for t in log]
print (urls)
Output: Output:
['pagead2.googlesyndication.com', 'pg.cdn.viber.com']
Dictionary refinement via rules:通过规则细化字典:
If you want to apply consecutive rules via a dictionary, go lambda
all the way:如果你想通过字典应用连续的规则,go lambda
一路:
import re
rules = {"r0": lambda x: x.split()[-1],
"r1": lambda x: re.sub(r'\(\d+\)','.',x),
"r2": lambda x: x.strip(".")}
result = []
for value in log:
result.append(value)
for r in rules:
result[-1] = rules[r](result[-1])
print(result)
Output: Output:
['pagead2.googlesyndication.com', 'pg.cdn.viber.com']
Yes, you can use re.sub
.是的,您可以使用re.sub
。 I assume you have this dictionary so you can extract multiple pieces from the log.我假设您有这本字典,因此您可以从日志中提取多个部分。 You can do something like this for dispatch :你可以为dispatch做这样的事情:
ops = {
"query": lambda e: (
re.sub(r"\(\d+\(", ".", (
re.search(r"(?P<query>[\S]*)$", e).group(1),
)
),
...
}
And then apply the functions to all log entires然后将这些函数应用于所有日志条目
log_results = {op_name: op(l) for op_name, op in ops.items() for l in log}
声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.