在python中使用正則表達式嵌套括號

Question

我有這樣的事情：

Othername California (2000) (T) (S) (ok) {state (#2.1)}

是否有正則表達式代碼獲取：

Othername California ok 2.1

即我想將數字保持在圓括號內，而這些數字又在{}內，並保持文本“ok”在（）內。 我特別需要字符串“ok”打印出來，如果包含在我的行中，但我想擺脫括號內的其他文本，例如（V），（S）或（2002）。

我知道可能正則表達式不是解決此類問題的最有效方法。

任何幫助，將不勝感激。

編輯：

字符串可能會有所不同，因為如果某些信息不可用，則不包含在該行中。 文本本身也是可變的（例如，每行都沒有“狀態”）。 所以可以有一個例子：

Name1 Name2 Name3 (2000) (ok) {edu (#1.1)}
Name1 Name2 (2002) {edu (#1.1)}
Name1 Name2 Name3 (2000) (V) {variation (#4.12)}

Answer 1

正則表達式

(.+)\s+\(\d+\).+?(?:\(([^)]{2,})\)\s+(?={))?\{.+\(#(\d+\.\d+)\)\}

正則表達圖像

用於測試的文本

Name1 Name2 Name3 (2000) {Education (#3.2)}
Name1 Name2 Name3 (2000) (ok) {edu (#1.1)}
Name1 Name2 (2002) {edu (#1.1)}
Name1 Name2 Name3 (2000) (V) {variation (#4.12)}
Othername California (2000) (T) (S) (ok) {state (#2.1)}

測試

>>> regex = re.compile("(.+)\s+\(\d+\).+?(?:\(([^)]{2,})\)\s+(?={))?\{.+\(#(\d+\.\d+)\)\}")
>>> r = regex.search(string)
>>> r
<_sre.SRE_Match object at 0x54e2105f36c16a48>
>>> regex.match(string)
<_sre.SRE_Match object at 0x54e2105f36c169e8>

# Run findall
>>> regex.findall(string)
[
   (u'Name1 Name2 Name3'   , u''  , u'3.2'),
   (u'Name1 Name2 Name3'   , u'ok', u'1.1'),
   (u'Name1 Name2'         , u''  , u'1.1'),
   (u'Name1 Name2 Name3'   , u''  , u'4.12'),
   (u'Othername California', u'ok', u'2.1')
]

Answer 2

試試這個：

import re

thestr = 'Othername California (2000) (T) (S) (ok) {state (#2.1)}'

regex = r'''
    ([^(]*)             # match anything but a (
    \                   # a space
    (?:                 # non capturing parentheses
        \([^(]*\)       # parentheses
        \               # a space
    ){3}                # three times
    \(([^(]*)\)         # capture fourth parentheses contents
    \                   # a space
    {                   # opening {
        [^}]*           # anything but }
        \(\#            # opening ( followed by #
            ([^)]*)     # match anything but )
        \)              # closing )
    }                   # closing }
'''

match = re.match(regex, thestr, re.X)

print match.groups()

輸出：

('Othername California', 'ok', '2.1')

這是壓縮版本：

import re

thestr = 'Othername California (2000) (T) (S) (ok) {state (#2.1)}'
regex = r'([^(]*) (?:\([^(]*\) ){3}\(([^(]*)\) {[^}]*\(\#([^)]*)\)}'
match = re.match(regex, thestr)

print match.groups()

Answer 3

盡管我在評論中說過。 我找到了解決方法：

(?(?=\([^()\w]*[\w.]+[^()\w]*\))\([^()\w]*([\w.]+)[^()\w]*\)|.)(?=[^{]*\})|(?<!\()(\b\w+\b)(?!\()|ok

說明：

(?                                  # If
(?=\([^()\w]*[\w.]+[^()\w]*\))      # There is (anything except [()\w] zero or more times, followed by [\w.] one or more times, followed by anything except [()\w] zero or more times)
\([^()\w]*([\w.]+)[^()\w]*\)        # Then match it, and put [\w.] in a group
|                                   # else
.                                   # advance with one character
)                                   # End if
(?=[^{]*\})                         # Look ahead if there is anything except { zero or more times followed by }

|                                   # Or
(?<!\()(\b\w+\b)(?!\()              # Match a word not enclosed between parenthesis
|                                   # Or
ok                                  # Match ok

在線演示

Answer 4

其他情況是：

^(\w+\s?\w+)\s?\(\d{1,}\)\s?\(\w+\)\s?\(\w+\)\s?\((\w+)\)\s?.*#(\d.\d)

在python中使用正則表達式嵌套括號

問題描述

4 個解決方案

解決方案1
8 已采納 2013-06-18 09:10:40

正則表達式

用於測試的文本

測試

解決方案2
2 2013-06-18 09:24:54

解決方案3
1 2013-06-18 09:39:41

解決方案4
0 2013-06-18 09:27:03

在python中使用正則表達式嵌套括號

問題描述

4 個解決方案

解決方案1 8 已采納 2013-06-18 09:10:40

正則表達式

用於測試的文本

測試

解決方案2 2 2013-06-18 09:24:54

解決方案3 1 2013-06-18 09:39:41

解決方案4 0 2013-06-18 09:27:03

解決方案1
8 已采納 2013-06-18 09:10:40

解決方案2
2 2013-06-18 09:24:54

解決方案3
1 2013-06-18 09:39:41

解決方案4
0 2013-06-18 09:27:03