簡體   English   中英

將行與特定字符串匹配以提取值Python Regex

[英]Match line with specific string to extract values Python Regex

在為該任務找到正確的正則表達式時遇到一些問題,請問我的初學者技能如何。 我想做的是僅從其“可用”:true而不是“可用”:false的行中獲取id值。 我可以通過re.findall('"id":(\\d{13})', line, re.DOTALL)獲得所有行的ID(13是正好匹配13位數字,因為在其中還有其他ID少於13位的代碼(我不需要)。

{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},
{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},

因此最終結果需要為['1651572973431','1351572943231']

感謝偉大的幫助

這可能不是一個很好的答案,這取決於您所擁有的。 看起來您有一個字符串列表,並且想要其中一些的ID。 如果真是這樣,那么如果您解析JSON而不是編寫拜占庭式正則表達式,它將更加整潔和易於閱讀。 例如:

import json

# lines is a list of strings:

lines = ['{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
'{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}',
]

# parse it and you can use regular python to get what you want:
[line['id'] for line in map(json.loads, lines) if line['available']]

結果

[1351572943231, 1651572973431]

如果您發布的代碼是一個長字符串,則可以將其包裝在[] ,然后將其解析為具有相同結果的數組:

import json

line = r'{"id":1351572979731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}, {"id":1351572329731,"parent_pid":21741,"available":false,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}, {"id":1351572943231,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""},{"id":1651572973431,"parent_pid":21741,"available":true,"lou":"678","feature":true,"pub":true,"require":null,"option4":""}'

lines = json.loads('[' + line + ']')
[line['id'] for line in lines if line['available']]

這可以匹配您想要的

(?<="id":)\\d{13}(?=(?:,"[^"]*":[^,]*?)*?,"available":true)

https://regex101.com/r/FseimH/1

展開式

 (?<= "id": )
 \d{13} 
 (?=
      (?: ," [^"]* ": [^,]*? )*?
      ,"available":true
 )

講解

 (?<= "id": )                        # Lookbehind assertion for id
 \d{13}                              # Consume 13 digit id
 (?=                                 # Lookahead assertion
      (?:                                 # Optional sequence
           ,                                   # comma
           " [^"]* "                           # quoted string
           :                                   # colon
           [^,]*?                              # optional non-comma's
      )*?                                 # End sequence, do 0 to many times - 
      ,"available":true                   # until we find  available = true
 )

在這里,我們可以簡單地使用“ id”作為左邊界,並在捕獲組中收集所需的數字:

"id":([0-9]+)

在此處輸入圖片說明

然后,我們可以繼續為其添加邊界。 例如,如果需要13位數字,我們可以簡單地:

\"id\":([0-9]{13})

暫無
暫無

聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.

 
粵ICP備18138465號  © 2020-2024 STACKOOM.COM