繁体   English   中英

我们如何使用 python 正则表达式解析文本?

[英]How can we parse the text using python regex?

我有以下文本,我想要字典格式的 output。

text = '''
146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622

197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554

156.127.178.177 - okuneva5222 [21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701

100.32.205.59 - ortiz8891 [21/Jun/2019:15:45:28 -0700] "PATCH /architectures HTTP/1.0" 204 6048
'''

我尝试了以下方法,但能够获得 2 个字典,而我希望返回 4 个。

names = []

for item in re.finditer("(?P<host>[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\s-\s(?P<user_name>[a-zA-Z0-9]+)\s\[(?P<time>\d{2}\/[a-zA-Z]+\/[0-9]+\:[0-9]+\:[0-9]+\:[0-9]+\s-\d{4})\]\s\"(?P<request>[a-zA-Z]+\s\/[a-zA-Z]+\s[a-zA-Z]+\/\d{1}\.\d{1})\"", text):
    item.groupdict()
    names.append(item.groupdict())
            
print(names)

任何人都可以帮助我吗?

您尝试匹配的字符串的这一部分:

"DELETE /virtual/solutions/target/web+services HTTP/2.0"

与您的正则表达式不匹配,因为它希望DELETE /之后的所有内容都是字母。 匹配的请求是:

POST /incentivize HTTP/1.1
PATCH /architectures HTTP/1.0

而那些不是

DELETE /virtual/solutions/target/web+services HTTP/2.0
DELETE /interactive/transparent/niches/revolutionize HTTP/1.1

更改正则表达式的request部分以识别/+除了字母字符:

"[a-zA-Z]+\s\/[a-zA-Z/+]+\s[a-zA-Z]+\/\d{1}\.\d{1}\"
                     ↑↑

代替

"[a-zA-Z]+\s\/[a-zA-Z]+\s[a-zA-Z]+\/\d{1}\.\d{1}\"

暂无
暂无

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM