如何用命名組在Python中編寫正則表達式以匹配此規則？

Question

我有一個文件，其中包含以下幾行。

comm = adbd pid = 11108 prio = 120成功= 1 target_cpu = 001

我寫了以下正則表達式來匹配。

_sched_wakeup_pattern = re.compile(r"""
comm=(?P<next_comm>.+?)
\spid=(?P<next_pid>\d+)
\sprio=(?P<next_prio>\d+)
\ssuccess=(?P<success>\d)
\starget_cpu=(?P<target_cpu>\d+)
""", re.VERBOSE)

但是現在我也有類似以下內容的行，其中沒有成功組件。

comm = rcu_preempt pid = 7 prio = 120 target_cpu = 007

如何在這里修改我的正則表達式以匹配兩種情況？ 我嘗試通過在包含“成功”的那一行到處都放置*，但是會引發錯誤。

Answer 1

使用正則表達式非捕獲組和regex.findAll函數的解決方案：

import regex
...
fh = open('lines.txt', 'r');  // considering 'lines.txt' is your initial file
commlines = fh.read()

_sched_wakeup_pattern = regex.compile(r"""
comm=(?P<next_comm>[\S]+?)
\spid=(?P<next_pid>\d+)
\sprio=(?P<next_prio>\d+)
(?:\ssuccess=)?(?P<success>\d)?
\starget_cpu=(?P<target_cpu>\d+)
""", regex.VERBOSE)

result = regex.findall(_sched_wakeup_pattern, commlines)

template = "{0:15}|{1:10}|{2:9}|{3:7}|{4:10}" # column widths
print(template.format("next_comm", "next_pid", "next_prio", "success", "target_cpu")) # header

for t in result:
    print(template.format(*t))

美化的輸出：

next_comm      |next_pid  |next_prio|success|target_cpu
rcu_preempt    |7         |120      |       |007       
kworker/u16:2  |73        |120      |       |006       
kworker/u16:4  |364       |120      |       |005       
adbd           |11108     |120      |1      |001       
kworker/1:1    |16625     |120      |1      |001       
rcu_preempt    |7         |120      |1      |002

Answer 2

匹配0或1重復(your_string)? 。

_sched_wakeup_pattern = re.compile(r"""
comm=(?P<next_comm>.+?)
\spid=(?P<next_pid>\d+)
\sprio=(?P<next_prio>\d+)
\s?(success=(?P<success>\d))?
\starget_cpu=(?P<target_cpu>\d+)
""", re.VERBOSE)

在這里，它會尋找整個字符串，因此它也會打印success= ：

output =>
('rcu_preempt', '7', '120', '', '', '007')
('kworker/u16:2', '73', '120', '', '', '006')
('kworker/u16:4', '364', '120', '', '', '005')
('adbd', '11108', '120', 'success=1', '1', '001')
('kworker/1:1', '16625', '120', 'success=1', '1', '001')
('rcu_preempt', '7', '120', 'success=1', '1', '002')

現在我們需要找出一種方法來刪除"success=" 。 這似乎並不困難。

[編輯]
(?:\\ssuccess=)?(?P<success>\\d)? 效果很好。
通過RomanPerekhrest

如何用命名組在Python中編寫正則表達式以匹配此規則？

問題描述

2 個解決方案

解決方案1
3 已采納 2016-09-17 12:09:34

解決方案2
2 2016-09-17 11:05:20

如何用命名組在Python中編寫正則表達式以匹配此規則？

問題描述

2 個解決方案

解決方案1 3 已采納 2016-09-17 12:09:34

解決方案2 2 2016-09-17 11:05:20

解決方案1
3 已采納 2016-09-17 12:09:34

解決方案2
2 2016-09-17 11:05:20