簡體   English   中英

我們如何使用 python 正則表達式解析文本?

[英]How can we parse the text using python regex?

我有以下文本,我想要字典格式的 output。

text = '''
146.204.224.152 - feest6811 [21/Jun/2019:15:45:24 -0700] "POST /incentivize HTTP/1.1" 302 4622

197.109.77.178 - kertzmann3129 [21/Jun/2019:15:45:25 -0700] "DELETE /virtual/solutions/target/web+services HTTP/2.0" 203 26554

156.127.178.177 - okuneva5222 [21/Jun/2019:15:45:27 -0700] "DELETE /interactive/transparent/niches/revolutionize HTTP/1.1" 416 14701

100.32.205.59 - ortiz8891 [21/Jun/2019:15:45:28 -0700] "PATCH /architectures HTTP/1.0" 204 6048
'''

我嘗試了以下方法,但能夠獲得 2 個字典,而我希望返回 4 個。

names = []

for item in re.finditer("(?P<host>[0-9]+\.[0-9]+\.[0-9]+\.[0-9]+)\s-\s(?P<user_name>[a-zA-Z0-9]+)\s\[(?P<time>\d{2}\/[a-zA-Z]+\/[0-9]+\:[0-9]+\:[0-9]+\:[0-9]+\s-\d{4})\]\s\"(?P<request>[a-zA-Z]+\s\/[a-zA-Z]+\s[a-zA-Z]+\/\d{1}\.\d{1})\"", text):
    item.groupdict()
    names.append(item.groupdict())
            
print(names)

任何人都可以幫助我嗎?

您嘗試匹配的字符串的這一部分:

"DELETE /virtual/solutions/target/web+services HTTP/2.0"

與您的正則表達式不匹配,因為它希望DELETE /之后的所有內容都是字母。 匹配的請求是:

POST /incentivize HTTP/1.1
PATCH /architectures HTTP/1.0

而那些不是

DELETE /virtual/solutions/target/web+services HTTP/2.0
DELETE /interactive/transparent/niches/revolutionize HTTP/1.1

更改正則表達式的request部分以識別/+除了字母字符:

"[a-zA-Z]+\s\/[a-zA-Z/+]+\s[a-zA-Z]+\/\d{1}\.\d{1}\"
                     ↑↑

代替

"[a-zA-Z]+\s\/[a-zA-Z]+\s[a-zA-Z]+\/\d{1}\.\d{1}\"

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM