正则表达式中的条件匹配

Question

I am trying extract some information from the below given string 我正在尝试从下面给出的字符串中提取一些信息

>>> st = '''
... <!-- info mp3 here -->
...                             192 kbps<br />2:41<br />3.71 mb  </div>
... <!-- info mp3 here -->
...                             3.49 mb  </div>
... <!-- info mp3 here -->
...                             128 kbps<br />3:31<br />3.3 mb   </div>
... '''
>>>

Now when I use the below regex my output is 现在当我使用下面的正则表达式我的输出是

>>> p = re.findall(r'<!-- info mp3 here -->\s+(.*?)<br />(.*?)<br />(.*?)\s+</div>',st)
>>> p
[('192 kbps', '2:41', '3.71 mb'), ('128 kbps', '3:31', '3.3 mb')]

but my required output is 但我要求的输出是

[('192 kbps', '2:41', '3.71 mb'),(None,None,'3.49mb'), ('128 kbps', '3:31', '3.3 mb')]

So, my question is how do I change my above regex to match all the conditions.I believe my current regex is strictly dependent on   tags so how do I make it conditional on that. 所以，我的问题是如何更改我的上述regex以匹配所有条件。我相信我当前的正则表达式严格依赖于 标签，所以我如何使其成为条件。

I know I should not be using regex to parse html but currently this is the most appropriate way for me. 我知道我不应该使用正则表达式来解析HTML，但目前这对我来说是最合适的方式。

Answer 1

The following will work, though I wonder if there's not a more elegant solution. 以下内容可行，但我想知道是否没有更优雅的解决方案。 You can certainly combine the list comprehensions into one line, but I think that makes the code less clear overall. 你当然可以将列表推导组合成一行，但我认为这会使代码总体上不那么清晰。 At least this way you'll be able to follow what you did three months from now... 至少通过这种方式，你可以跟随你从现在起三个月后的所作所为......

st = '''
<!-- info mp3 here -->
                            192 kbps<br />2:41<br />3.71 mb  </div>
<!-- info mp3 here -->
                            3.49 mb  </div>
<!-- info mp3 here -->
                            128 kbps<br />3:31<br />3.3 mb   </div>
'''

p = re.findall(r'<!-- info mp3 here -->\s+(.*?)\s+</div>',st)
p2 = [row.split('<br />') for row in p]
p3 = [[None]*(3 - len(row)) + row for row in p2]

>>> p3
[['192 kbps', '2:41', '3.71 mb'], [None, None, '3.49 mb'], ['128 kbps', '3:31', '3.3 mb']]

And, depending on the variability in your string, you may want to write a more generic cleaning function that strips, cases, whatever, and map it to each item you pull out. 并且，根据字符串的可变性，您可能希望编写一个更通用的清除函数，即条带，大小写等等，并将其映射到您提取的每个项目。

Answer 2

Here's a regex solution that works by being a bit more specific. 这是一个正则表达式解决方案，通过更具体的工作。 I'm not sure this is preferable to Karmel 's answer, but I figured I'd answer the question as asked. 我不确定这比卡梅尔的回答更可取，但我想我会按照要求回答这个问题。 Instead of returning None , the first two optional groups return the empty string '' , which I think is probably close enough. 而不是返回None ，前两个可选组返回空字符串'' ，我认为它可能足够接近。

Note the nested group structure. 请注意嵌套的组结构。 The first two outer groups are optional, but the   tag is required for them to match. 前两个外部组是可选的，但 标记是他们匹配所必需的。 That way, if there are fewer than two   tags, the last item doesn't match until the end: 这样，如果少于两个 标签，则最后一项在结束前不匹配：

rx = r'''<!--\ info\ mp3\ here\ -->\s+   # verbose mode; escape literal spaces
         (?:                             # outer non-capturing group  
            ([^<>]*)                     # inner capturing group without <>
            (?:<br\ />)                  # inner non-capturing group matching br
         )?                              # whole outer group is optional
         (?:                             
            ([^<>]*)                     # all same as above
            (?:<br\ />)                
         )?
         (?:                             # outer non-capturing group
            (.*?)                        # non-greedy wildcard match
            (?:\s+</div>)                # inner non-capturing group matching div
         )'''                            # final group is not optional

Tested: 测试：

>>> re.findall(rx, st, re.VERBOSE)
[('192 kbps', '2:41', '3.71 mb'), 
 ('', '', '3.49 mb'), 
 ('128 kbps', '3:31', '3.3 mb')]

Note the re.VERBOSE flag, which is necessary unless you remove all the whitespace and comments above. 请注意re.VERBOSE标志，除非您删除上面的所有空格和注释，否则这是必需的。

正则表达式中的条件匹配

问题描述

2 个解决方案

解决方案1
6 已采纳 2012-05-24 20:33:56

解决方案2
2 2012-05-24 20:48:05

正则表达式中的条件匹配

问题描述

2 个解决方案

解决方案1 6 已采纳 2012-05-24 20:33:56

解决方案2 2 2012-05-24 20:48:05

解决方案1
6 已采纳 2012-05-24 20:33:56

解决方案2
2 2012-05-24 20:48:05