正则表达式匹配某些字符

Question

我有这样的字符串......

"1. yada yada yada (This is a string; "This is a thing")
 2. blah blah blah (This is also a string)"

我想回来......

['this is a string', 'this is also a string']

所以它应匹配'（'和';'之间或'''和'''之间的所有内容

这是我到目前为止在python匹配我想要的部分，但我无法弄清楚如何削减它们以返回我真正想要的内容......

pattern = re.compile('\([a-zAZ ;"]+\)|\([a-zAZ ]+\)')
re.findall(pattern)

它返回这个......

['(This is a string; "This is a thing"), '(This is also a string)']

编辑增加了更多信息：

我意识到在我想要省略的数字文本部分之上有更多的括号....

"some text and stuff (some more info)
 1. yada yada yada (This is a string; "This is a thing")
 2. blah blah blah (This is also a string)"

我不想匹配“（更多信息）”但我不确定如何只在数字后面包含文本（例如1. lskdfjlsdjfds（我想要的字符串））

Answer 1

您可以使用

\(([^);]+)

正则表达式演示可在此处获得。

请注意我在非转义括号的帮助下设置的捕获组：使用此子模式捕获的值由re.findall方法返回，而不是整个匹配。

它匹配

\\( - 文字(
([^);]+) -比赛和捕捉比其他1个或多个字符)或;

Python演示：

import re
p = re.compile(r'\(([^);]+)')
test_str = "1. yada yada yada (This is a string; \"This is a thing\")\n2. blah blah blah (This is also a string)"
print(p.findall(test_str)) # => ['This is a string', 'This is also a string']

Answer 2

我会建议

^[^\(]*\(([^;\)]+)

将其拆分为多个部分：

# ^         - start of string
# [^\(]*    - everything that's not an opening bracket
# \(        - opening bracket
# ([^;\)]+) - capture everything that's not semicolon or closing bracket

除非你当然希望对“等等等等”部分强加（或放弃）一些要求。

你可以删除前两个部分，但它会匹配一些它可能不应该的东西......或者它应该。 这一切都取决于你的目标是什么。

PS错过了你想要找到所有实例。 因此需要设置多行标志：

pattern = re.compile(r'^[^\(]*\(([^;\)]+)', re.MULTILINE)
matches = pattern.findall(string_to_search)

检查行的开头很重要，因为您的输入可以是：

"""1. yada yada yada (This is a string; "This is a (thing)")
2. blah blah blah (This is also a string)"""

正则表达式匹配某些字符

问题描述

2 个解决方案

解决方案1
2 已采纳 2015-12-03 14:21:01

解决方案2
1 2015-12-03 14:24:14

正则表达式匹配某些字符

问题描述

2 个解决方案

解决方案1 2 已采纳 2015-12-03 14:21:01

解决方案2 1 2015-12-03 14:24:14

解决方案1
2 已采纳 2015-12-03 14:21:01

解决方案2
1 2015-12-03 14:24:14