简体   繁体   English

在可选括号之前和括号内部提取单词

[英]extract words before optional parentheses and inside parentheses

I have a bunch of strings that look like the following two sentences: 我有一堆看起来像以下两个句子的字符串:

A couple of words (abbreviation)
A couple of words

I am trying to get python to extract the 'a couple of words' part and the 'abbreviation' part with a single regex, while also allowing strings where no abbreviation is given. 我正在尝试让python使用单个正则表达式提取“几个单词”部分和“缩写”部分,同时还允许不给出缩写的字符串。

I've come up with this: 我想出了这个:

re_both = re.compile(r"^(.*)(?:\((.*)\))$")

It works for the first case, but not for the second case: 它适用于第一种情况,但不适用于第二种情况:

[in]   re_both.findall('a couple of words (abbreviation)')
[out]  [('a couple of words ', 'abbreviation')]

[in]   re_both.findall('a couple of words')
[out]  []

I would like the second case to yield: 我想让第二种情况产生:

[out] [('a couple of words','')]

Can this be done somehow? 可以以某种方式做到这一点吗?

Your regex is fine except you have to make the second part optional and make the first part non greedy.: 您的正则表达式很好,只不过您必须使第二部分为可选,并使第一部分为非贪婪。

re_both = re.compile(r"^(.*?)(?:\((.*)\))?$")
#                   here __^      here __^

You need to make the second part as optional by adding a quantifier ? 您需要通过添加量词使第二部分为可选部分? , and also you need to add the quantifer ? ,还需要添加量词? inside the first capturing group just after to .* so that it would do a non-greedy match. .*之后的第一个捕获组中,以便进行非贪婪匹配。

^(.*?)(?:\((.*)\))?$
    ^             ^

DEMO DEMO

If you don't want to capture the space which was just before to the ( by the first capturing group then you could try the below regex, 如果你不希望捕捉到这只是对之前的空间(由第一个捕获组,那么你可以试试下面的正则表达式,

^(.*?)(?: \((.*)\))?$

DEMO DEMO

>>> import re
>>> s = """A couple of words (abbreviation)
... A couple of words"""
>>> m = re.findall(r'^(.*?)(?: \((.*)\))?$', s, re.M)
>>> m
[('A couple of words', 'abbreviation'), ('A couple of words', '')]
>>> m = re.findall(r'^(.*?)(?:\((.*)\))?$', s, re.M)
>>> m
[('A couple of words ', 'abbreviation'), ('A couple of words', '')]

声明:本站的技术帖子网页,遵循CC BY-SA 4.0协议,如果您需要转载,请注明本站网址或者原文地址。任何问题请咨询:yoyou2525@163.com.

 
粤ICP备18138465号  © 2020-2024 STACKOOM.COM