在python中使用正则表达式提取括号内的单词

Question

What will be the regular expression for the following pattern shown in the image below?下图中显示的以下模式的正则表达式是什么？ (Note: there are many more tags and in no specific order.there is a lot of information between the tags that dont follow this pattern. i just need to extract the information within the large bracket) （注意：标签还有很多，没有特定的顺序。标签之间有很多信息不遵循这种模式。我只需要提取大括号内的信息）

I need to seperate the data inside the large bracket seperately.我需要单独将大括号内的数据分开。 for eg severity and 2. So far, i have only been able to collect the data having such large brackets using r'\\[([^]]*)\\]' .例如，严重性和 2。到目前为止，我只能使用r'\\[([^]]*)\\]'收集具有如此大括号的数据。 how do i seperate them?我如何将它们分开？ and please do explain.请解释一下。 I am familiar with regex symbols but cannot work my head around with these complicated patterns.我熟悉正则表达式符号，但无法处理这些复杂的模式。

Answer 1

You may use您可以使用

import re

rx = re.compile("""\[(?P<key>[^\]\[\s]+)(?:\s+"(?P<value>[^"]+)")?\]""")
text = """lorem ipsum [severity "2"] [ver ""] [maturity "0"] [accuracy "0"] [tag "application-multi"] lorem ipsum"""

result = {m.group('key'): m.group('value') for m in rx.finditer(text)}
print(result)

Which yields哪个产量

{'severity': '2', 'maturity': '0', 'accuracy': '0', 'tag': 'application-multi'}

See a demo on regex101.com .在 regex101.com 上查看演示。

Answer 2

import re
value = '[severity "2"] [ver ""] [maturity "0"] [accuracy "0"] [tag "application-multi"]'
print(re.findall(r'\[(\w+)\s+"([^"]+)"\]', value))

This will give you the keys and values: [('severity', '2'), ('maturity', '0'), ('accuracy', '0'), ('tag', 'application-multi')]这将为您提供键和值： [('severity', '2'), ('maturity', '0'), ('accuracy', '0'), ('tag', 'application-multi')]

If you want a dictionary that's easy: print(dict(re.findall(r'\\[(\\w+)\\s+"([^"]+)"\\]', value)))如果你想要一本简单的字典： print(dict(re.findall(r'\\[(\\w+)\\s+"([^"]+)"\\]', value)))

Now the explanation of the regular expression.现在解释正则表达式。 First looking for an opening bracket: \\[ (escaped).首先寻找一个左括号： \\[ （转义）。 Then catch the word characters: (\\w+) .然后捕捉单词字符： (\\w+) 。 Then one or more spaces followed by a double quote: \\s+" . Then we catch everything that's not a double quote: ([^"]+) .然后一个或多个空格后跟一个双引号： \\s+" 。然后我们捕获所有不是双引号的内容： ([^"]+) 。 Finally find the double quote followed by the closing bracket: "\\] .最后找到双引号后跟右括号： "\\] 。

Answer 3

I suggest using re.finditer to loop over matches, and use these to create a dictionary:我建议使用re.finditer来循环匹配，并使用这些来创建字典：

import re

text = '[severity "2"] [ver ""] [maturity "0"] [accuracy "0"] [tag "application-multi"]'

tags = {m.group(1): m.group(2)
        for m in re.finditer('\[(.*?)\s*"(.*?)"\]', text)}

print(tags)

{'severity': '2', 'ver': '', 'maturity': '0', 'accuracy': '0', 'tag': 'application-multi'}

This makes it convenient to extract data items, but it does of course assume that keys are unique.这使得提取数据项很方便，但它当然假设键是唯一的。 If they are not, then you could instead use for example a list of 2-tuples:如果不是，那么您可以改用例如 2 元组列表：

[(m.group(1), m.group(2))
 for m in re.finditer('\[(.*?)\s*"(.*?)"\]', text)]

[('severity', '2'), ('ver', ''), ('maturity', '0'), ('accuracy', '0'), ('tag', 'application-multi')]

Answer 4

If you want both the first and second word of each pair:如果您想要每对的第一个和第二个单词：

>>> import re
>>> inp = '[severity "2"] [ver ""] [maturity "0"] [accuracy "0"] [tag "application-multi"]'
>>> list_of_tuples = re.findall(r'\[(\w+) \"(.*?)\"\]', inp)
>>> list_of_tuples
[('severity', '2'), ('ver', ''), ('maturity', '0'), ('accuracy', '0'), ('tag', 'application-multi')]

Answer 5

Use用

\[([^][]+?)(?:\s+"([^"]*)")?]

See proof查看证明

Explanation解释

--------------------------------------------------------------------------------
  \[                       '['
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [^][]+?                  any character except: ']', '[' (1 or
                             more times (matching the least amount
                             possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  (?:                      group, but do not capture (optional
                           (matching the most amount possible)):
--------------------------------------------------------------------------------
    \s+                      whitespace (\n, \r, \t, \f, and " ") (1
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    "                        '"'
--------------------------------------------------------------------------------
    (                        group and capture to \2:
--------------------------------------------------------------------------------
      [^"]*                    any character except: '"' (0 or more
                               times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
    )                        end of \2
--------------------------------------------------------------------------------
    "                        '"'
--------------------------------------------------------------------------------
  )?                       end of grouping
--------------------------------------------------------------------------------
  ]                        ']'

Python code : 蟒蛇代码：

import re
expression = r'\[([^][]+?)(?:\s+"([^"]*)")?]'
test = 'lorem ipsum [severity "2"] [ver ""] [maturity "0"] [accuracy "0"] [tag "application-multi"] lorem ipsum'
print( {x.group(1):x.group(2) for x in re.finditer(expression, test)} )

Result:结果：

{'severity': '2', 'ver': '', 'maturity': '0', 'accuracy': '0', 'tag': 'application-multi'}

在python中使用正则表达式提取括号内的单词

问题描述

5 个解决方案

解决方案1
1 已采纳 2020-09-10 16:15:35

解决方案2
1 2020-09-10 16:15:44

解决方案3
1 2020-09-10 16:17:46

解决方案4
1 2020-09-10 16:23:49

解决方案5
0 2020-09-10 19:45:51

在python中使用正则表达式提取括号内的单词

问题描述

5 个解决方案

解决方案1 1 已采纳 2020-09-10 16:15:35

解决方案2 1 2020-09-10 16:15:44

解决方案3 1 2020-09-10 16:17:46

解决方案4 1 2020-09-10 16:23:49

解决方案5 0 2020-09-10 19:45:51

解决方案1
1 已采纳 2020-09-10 16:15:35

解决方案2
1 2020-09-10 16:15:44

解决方案3
1 2020-09-10 16:17:46

解决方案4
1 2020-09-10 16:23:49

解决方案5
0 2020-09-10 19:45:51