[英]Match line with specific string to extract values Python Regex
在為該任務找到正確的正則表達式時遇到一些問題,請問我的初學者技能如何。 我想做的是僅從其“可用”:true而不是“可用”:false的行中獲取id值。 我可以通過re.findall('"id":(\\d{13})', line, re.DOTALL)
獲得所有行的ID(13是正好匹配13位數字,因為在其中還有其他ID少於13位的代碼(我不需要)。
{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
因此最終結果需要為['1651572973431','1351572943231']
感謝偉大的幫助
這可能不是一個很好的答案,這取決於您所擁有的。 看起來您有一個字符串列表,並且想要其中一些的ID。 如果真是這樣,那么如果您解析JSON而不是編寫拜占庭式正則表達式,它將更加整潔和易於閱讀。 例如:
import json
# lines is a list of strings:
lines = ['{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
]
# parse it and you can use regular python to get what you want:
[line['id'] for line in map(json.loads, lines) if line['available']]
結果
[1351572943231, 1651572973431]
如果您發布的代碼是一個長字符串,則可以將其包裝在[]
,然后將其解析為具有相同結果的數組:
import json
line = r'{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}, {"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}, {"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}'
lines = json.loads('[' + line + ']')
[line['id'] for line in lines if line['available']]
這可以匹配您想要的
(?<="id":)\\d{13}(?=(?:,"[^"]*":[^,]*?)*?,"available":true)
https://regex101.com/r/FseimH/1
展開式
(?<= "id": )
\d{13}
(?=
(?: ," [^"]* ": [^,]*? )*?
,"available":true
)
講解
(?<= "id": ) # Lookbehind assertion for id
\d{13} # Consume 13 digit id
(?= # Lookahead assertion
(?: # Optional sequence
, # comma
" [^"]* " # quoted string
: # colon
[^,]*? # optional non-comma's
)*? # End sequence, do 0 to many times -
,"available":true # until we find available = true
)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.