[英]Regex nested parenthesis in python
我有這樣的事情:
Othername California (2000) (T) (S) (ok) {state (#2.1)}
是否有正則表達式代碼獲取:
Othername California ok 2.1
即我想將數字保持在圓括號內,而這些數字又在{}內,並保持文本“ok”在()內。 我特別需要字符串“ok”打印出來,如果包含在我的行中,但我想擺脫括號內的其他文本,例如(V),(S)或(2002)。
我知道可能正則表達式不是解決此類問題的最有效方法。
任何幫助,將不勝感激。
編輯:
字符串可能會有所不同,因為如果某些信息不可用,則不包含在該行中。 文本本身也是可變的(例如,每行都沒有“狀態”)。 所以可以有一個例子:
Name1 Name2 Name3 (2000) (ok) {edu (#1.1)}
Name1 Name2 (2002) {edu (#1.1)}
Name1 Name2 Name3 (2000) (V) {variation (#4.12)}
(.+)\s+\(\d+\).+?(?:\(([^)]{2,})\)\s+(?={))?\{.+\(#(\d+\.\d+)\)\}
Name1 Name2 Name3 (2000) {Education (#3.2)} Name1 Name2 Name3 (2000) (ok) {edu (#1.1)} Name1 Name2 (2002) {edu (#1.1)} Name1 Name2 Name3 (2000) (V) {variation (#4.12)} Othername California (2000) (T) (S) (ok) {state (#2.1)}
>>> regex = re.compile("(.+)\s+\(\d+\).+?(?:\(([^)]{2,})\)\s+(?={))?\{.+\(#(\d+\.\d+)\)\}") >>> r = regex.search(string) >>> r <_sre.SRE_Match object at 0x54e2105f36c16a48> >>> regex.match(string) <_sre.SRE_Match object at 0x54e2105f36c169e8> # Run findall >>> regex.findall(string) [ (u'Name1 Name2 Name3' , u'' , u'3.2'), (u'Name1 Name2 Name3' , u'ok', u'1.1'), (u'Name1 Name2' , u'' , u'1.1'), (u'Name1 Name2 Name3' , u'' , u'4.12'), (u'Othername California', u'ok', u'2.1') ]
試試這個:
import re
thestr = 'Othername California (2000) (T) (S) (ok) {state (#2.1)}'
regex = r'''
([^(]*) # match anything but a (
\ # a space
(?: # non capturing parentheses
\([^(]*\) # parentheses
\ # a space
){3} # three times
\(([^(]*)\) # capture fourth parentheses contents
\ # a space
{ # opening {
[^}]* # anything but }
\(\# # opening ( followed by #
([^)]*) # match anything but )
\) # closing )
} # closing }
'''
match = re.match(regex, thestr, re.X)
print match.groups()
輸出:
('Othername California', 'ok', '2.1')
這是壓縮版本:
import re
thestr = 'Othername California (2000) (T) (S) (ok) {state (#2.1)}'
regex = r'([^(]*) (?:\([^(]*\) ){3}\(([^(]*)\) {[^}]*\(\#([^)]*)\)}'
match = re.match(regex, thestr)
print match.groups()
盡管我在評論中說過。 我找到了解決方法:
(?(?=\([^()\w]*[\w.]+[^()\w]*\))\([^()\w]*([\w.]+)[^()\w]*\)|.)(?=[^{]*\})|(?<!\()(\b\w+\b)(?!\()|ok
說明:
(? # If
(?=\([^()\w]*[\w.]+[^()\w]*\)) # There is (anything except [()\w] zero or more times, followed by [\w.] one or more times, followed by anything except [()\w] zero or more times)
\([^()\w]*([\w.]+)[^()\w]*\) # Then match it, and put [\w.] in a group
| # else
. # advance with one character
) # End if
(?=[^{]*\}) # Look ahead if there is anything except { zero or more times followed by }
| # Or
(?<!\()(\b\w+\b)(?!\() # Match a word not enclosed between parenthesis
| # Or
ok # Match ok
其他情況是:
^(\w+\s?\w+)\s?\(\d{1,}\)\s?\(\w+\)\s?\(\w+\)\s?\((\w+)\)\s?.*#(\d.\d)
聲明:本站的技術帖子網頁,遵循CC BY-SA 4.0協議,如果您需要轉載,請注明本站網址或者原文地址。任何問題請咨詢:yoyou2525@163.com.